Re: CfC: publish WD of Streams API; deadline Nov 3 from Dean Landolt on 2013-10-31 (public-webapps@w3.org from October to December 2013)

From: Dean Landolt <dean@deanlandolt.com>
Date: Wed, 30 Oct 2013 22:30:34 -0400
To: Arthur Barstow <art.barstow@nokia.com>
Cc: public-webapps <public-webapps@w3.org>
Message-ID: <CAPm8pjrANEy1xb2ngUgnOb1nb=BGiFA+m2xztjCL2_QoRB93ZA@mail.gmail.com>
I really like this general concepts of this proposal, but I'm confused by
what seems like an unnecessary limiting assumption: why assume all streams
are byte streams? This is a mistake node recently made in its streams
refactor that has led to an "objectMode" and added cruft.

Forgive me if this has been discussed -- I just learned of this today. But
as someone who's been slinging streams in javascript for years I'd really
hate to see the "standard" stream hampered by this bytes-only limitation.
The node ecosystem clearly demonstrates that streams are for more than
bytes and (byte-encoded) strings.

In my perfect world any arbitrary iterator could be used to characterize
stream chunks -- this would have some really interesting benefits -- but I
suspect this kind of flexibility would be overkill for now. But there's
good no reason bytes should be the only thing people can chunk up in
streams. And if we're defining streams for the whole platform they
shouldn't *just* be tied to a few very specific file-like use cases.

If streams could also consist of chunks of strings (real, native strings) a
huge swath of the API could disappear. All of readType, readEncoding and
charset could be eliminated, replaced with simple, composable transforms
that turn byte streams (of, say, utf-8) into string streams. And vice versa.

The `size` of a stream (if it exists) would be specified as the total
`length` of all chunks concatenated together. So if chunks were in bytes,
`size` would be the total bytes (as currently specified). But if chunks
consisted of real strings, `size` would be the total length of all string
chunks. Interestingly, if your source stream is in utf-8 the total bytes
wouldn't be meaningful, and the total string size couldn't be known without
iterating the whole stream. But if the source stream is utf-16 and the
`size` is known, the new `size` could also be known ahead of time -- `bytes
/ 2` (thanks to javascript's ucs-2 strings).

Of course the real draw of this approach would be when chunks are neither
blobs nor strings. Why couldn't chunks be arrays? The arrays could contain
anything (no need to reserve any value as a sigil). Regardless of the chunk
type, the zero object for any given type wouldn't be `null` (it would be
something like '' or []). That means we can use null to distinguish EOF,
and `chunk == null` would make a perfectly nice (and unambiguous) EOF
sigil, eliminating yet more API surface. This would give us a clean object
mode streams for free, and without node's arbitrary limitations.

The `size` of an array stream would be the total length of all array
chunks. As I hinted before, we could also leave the door open to specifying
chunks as any iterable, where `size` (if known) would just be the `length`
of each chunk (assuming chunks even have a `length`). This would also allow
individual chunks to be built of generators, which could be particularly
interesting if the `size` argument to `read` was specified as a maximum
number of bytes rather than the total to return -- completely sensible
considering it has to behave this way near the end of the stream anyway...

This would lead to a pattern like `stream.read(Infinity)`, which would
essentially say *give me everything you've got soon as you can*. This is
closer to node's semantics (where read is async, for added scheduling
flexibility). It would drain streams faster rather than pseudo-blocking for
a specific (and arbitrary) size chunk which ultimately can't be guaranteed
anyway, so you'll always have to do length checks.

(On a somewhat related note: why is a 0-sized stream specified to throw?
And why a SyntaxError of all things? A 0-sized stream seems perfectly
reasonable to me.)

What's particularly appealing to me about the chunk-as-generator idea is
that these chunks could still be quite large -- hundreds megabytes, even.
Just because a potentially large amount of data has become available since
the last chunk was processed doesn't mean you should have to bring it all
into memory at once.

I know this is a long email and it may sound like a lot of suggestions, but
I think it's actually a relatively minor tweak (and simplification) that
would unlock the real power of streams for their many other use cases. I've
been thinking about streams and promises (and streams with promises) for
years now, and this is the first approach that really feels right to me.






On Mon, Oct 28, 2013 at 11:29 AM, Arthur Barstow <art.barstow@nokia.com>wrote:

> Feras and Takeshi have begun merging their Streams proposal and this is a
> Call for Consensus to publish a new WD of Streams API using the updated ED
> as the basis:
>
> <https://dvcs.w3.org/hg/**streams-api/raw-file/tip/**Overview.htm<https://dvcs.w3.org/hg/streams-api/raw-file/tip/Overview.htm>
> >
>
> Please note the Editors may update the ED before the TR is published (but
> they do not intend to make major changes during the CfC).
>
> Agreement to this proposal: a) indicates support for publishing a new WD;
> and b) does not necessarily indicate support of the contents of the WD.
>
> If you have any comments or concerns about this proposal, please reply to
> this e-mail by November 3 at the latest. Positive response to this CfC is
> preferred and encouraged and silence will be assumed to mean agreement with
> the proposal.
>
> -Thanks, ArtB
>
>
Received on Thursday, 31 October 2013 02:31:42 UTC