- From: Dean Landolt <dean@deanlandolt.com>
- Date: Wed, 30 Oct 2013 22:30:34 -0400
- To: Arthur Barstow <art.barstow@nokia.com>
- Cc: public-webapps <public-webapps@w3.org>
- Message-ID: <CAPm8pjrANEy1xb2ngUgnOb1nb=BGiFA+m2xztjCL2_QoRB93ZA@mail.gmail.com>
I really like this general concepts of this proposal, but I'm confused by what seems like an unnecessary limiting assumption: why assume all streams are byte streams? This is a mistake node recently made in its streams refactor that has led to an "objectMode" and added cruft. Forgive me if this has been discussed -- I just learned of this today. But as someone who's been slinging streams in javascript for years I'd really hate to see the "standard" stream hampered by this bytes-only limitation. The node ecosystem clearly demonstrates that streams are for more than bytes and (byte-encoded) strings. In my perfect world any arbitrary iterator could be used to characterize stream chunks -- this would have some really interesting benefits -- but I suspect this kind of flexibility would be overkill for now. But there's good no reason bytes should be the only thing people can chunk up in streams. And if we're defining streams for the whole platform they shouldn't *just* be tied to a few very specific file-like use cases. If streams could also consist of chunks of strings (real, native strings) a huge swath of the API could disappear. All of readType, readEncoding and charset could be eliminated, replaced with simple, composable transforms that turn byte streams (of, say, utf-8) into string streams. And vice versa. The `size` of a stream (if it exists) would be specified as the total `length` of all chunks concatenated together. So if chunks were in bytes, `size` would be the total bytes (as currently specified). But if chunks consisted of real strings, `size` would be the total length of all string chunks. Interestingly, if your source stream is in utf-8 the total bytes wouldn't be meaningful, and the total string size couldn't be known without iterating the whole stream. But if the source stream is utf-16 and the `size` is known, the new `size` could also be known ahead of time -- `bytes / 2` (thanks to javascript's ucs-2 strings). Of course the real draw of this approach would be when chunks are neither blobs nor strings. Why couldn't chunks be arrays? The arrays could contain anything (no need to reserve any value as a sigil). Regardless of the chunk type, the zero object for any given type wouldn't be `null` (it would be something like '' or []). That means we can use null to distinguish EOF, and `chunk == null` would make a perfectly nice (and unambiguous) EOF sigil, eliminating yet more API surface. This would give us a clean object mode streams for free, and without node's arbitrary limitations. The `size` of an array stream would be the total length of all array chunks. As I hinted before, we could also leave the door open to specifying chunks as any iterable, where `size` (if known) would just be the `length` of each chunk (assuming chunks even have a `length`). This would also allow individual chunks to be built of generators, which could be particularly interesting if the `size` argument to `read` was specified as a maximum number of bytes rather than the total to return -- completely sensible considering it has to behave this way near the end of the stream anyway... This would lead to a pattern like `stream.read(Infinity)`, which would essentially say *give me everything you've got soon as you can*. This is closer to node's semantics (where read is async, for added scheduling flexibility). It would drain streams faster rather than pseudo-blocking for a specific (and arbitrary) size chunk which ultimately can't be guaranteed anyway, so you'll always have to do length checks. (On a somewhat related note: why is a 0-sized stream specified to throw? And why a SyntaxError of all things? A 0-sized stream seems perfectly reasonable to me.) What's particularly appealing to me about the chunk-as-generator idea is that these chunks could still be quite large -- hundreds megabytes, even. Just because a potentially large amount of data has become available since the last chunk was processed doesn't mean you should have to bring it all into memory at once. I know this is a long email and it may sound like a lot of suggestions, but I think it's actually a relatively minor tweak (and simplification) that would unlock the real power of streams for their many other use cases. I've been thinking about streams and promises (and streams with promises) for years now, and this is the first approach that really feels right to me. On Mon, Oct 28, 2013 at 11:29 AM, Arthur Barstow <art.barstow@nokia.com>wrote: > Feras and Takeshi have begun merging their Streams proposal and this is a > Call for Consensus to publish a new WD of Streams API using the updated ED > as the basis: > > <https://dvcs.w3.org/hg/**streams-api/raw-file/tip/**Overview.htm<https://dvcs.w3.org/hg/streams-api/raw-file/tip/Overview.htm> > > > > Please note the Editors may update the ED before the TR is published (but > they do not intend to make major changes during the CfC). > > Agreement to this proposal: a) indicates support for publishing a new WD; > and b) does not necessarily indicate support of the contents of the WD. > > If you have any comments or concerns about this proposal, please reply to > this e-mail by November 3 at the latest. Positive response to this CfC is > preferred and encouraged and silence will be assumed to mean agreement with > the proposal. > > -Thanks, ArtB > >
Received on Thursday, 31 October 2013 02:31:42 UTC