Re: [whatwg/encoding] Rename Encoding's "streams" to "token queues" (#215) from Andreu Botella on 2020-05-29 (public-webapps-github@w3.org from May 2020)

From: Andreu Botella <notifications@github.com>
Date: Fri, 29 May 2020 07:43:08 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/pull/215/c636012051@github.com>

> I realize the timing of this is not great, but looking at this I wonder if this should be (a subtype of) https://infra.spec.whatwg.org/#queues. The main novelty is returning end-of-stream (which we should rename to final-item or final-token, I think) when the list is empty. And even that seems handled in a way as Infra returns nothing which is something we could branch on.

I don't think token queues are a subset of Infra queues, since prepend is a thing. Also, since read would need changes from dequeue, and you can push multiple tokens at a time which you can't do with enqueue, no real benefit would come from depending on queue. Making token queues a subset of list, rather than them being an "ordered sequence", would work though.

By the way, should we even export prepend? The "implementation considerations" appendix lists alternatives to implementing prepend which work for the encoding algorithms in the spec, but wouldn't if other specs are allowed to use prepend arbitrarily.

> At that point all that remains is mapping strings/byte sequences to lists which is something we should allow implicitly anyway I think so "for each" and such can be used on them (although for strings we might need an explicit variant for code points; if you want neither code units nor scalar values).

While I don't oppose defining strings and byte sequences as lists, I don't see how token streams would benefit from being able to iterate through them without dequeuing tokens, which is what is usually intended.

In any case, the fact that token queues can be implicitly converted to and from strings/byte sequences should be specified. Which brings me to wondering whether the conversion into a string/byte sequence should indeed empty the queue, since if the token queue is backed by I/O it'd have to block either way. If that is the case, then the BOM sniff hook would have to switch to read and prepend rather than use "starts with".

> (What it would continue to hide/neglect, which may or may not be bad, is some kind of waiting signal to indicate the difference between the end and I/O being slow.)

Token queues are defined as simple list-like data structures, not dependent on I/O, which implies that a straightforward implementation would have to read a byte stream from the network in its entirety before passing it to one of the decode hooks, with the only affordance for I/O in actual implementations is BOM sniff's (formerly decode's) "wait for three bytes or until the end-of-stream". So doing something like that would require changing token queues to optionally be backed by I/O, which would need a separate PR.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/215#issuecomment-636012051

Received on Friday, 29 May 2020 14:43:21 UTC