Re: Overlap between StreamReader and FileReader from Takeshi Yoshino on 2013-09-11 (public-webapps@w3.org from July to September 2013)

From: Takeshi Yoshino <tyoshino@google.com>
Date: Wed, 11 Sep 2013 14:28:40 +0900
To: Isaac Schlueter <i@izs.me>
Cc: Aymeric Vitte <vitteaymeric@gmail.com>, Jonas Sicking <jonas@sicking.cc>, Austin William Wright <aaa@bzfx.net>, Domenic Denicola <domenic@domenicdenicola.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CAH9hSJZT_ZpnObv5qb7RZ=HOydTfx9RJob0070_nbKGS6sP5qg@mail.gmail.com>

On Fri, Aug 23, 2013 at 2:41 AM, Isaac Schlueter <i@izs.me> wrote:

> 1. Drop the "read n bytes" part of the API entirely.  It is hard to do

I'm ok with that. But then, instead we need to evolve ArrayBuffer to have
powerful concat/slice functionality for performance. Re: slicing, we can
just make APIs to accept ArrayBufferView. How should we deal with concat
operation? You suggested that we add unshift(), but repeating read and
unshift until we get enough data sound not so good.

For example, currently TextDecoder (http://encoding.spec.whatwg.org/)
accepts one ArrayBufferView and outputs one DOMString. We can use "stream"
mode of TextDecoder to get multiple output DOMStrings and then concatenate
them to get the final result.

As we still don't have StringBuilder, it's not considered to be a big deal
to have "ArrayBufferBuilder" (Stream.read(size) is kinda ArrayBuffer
builder)?

Is any of you guys thinking about introducing something like Node.js's
Buffer class for decoding and tokenization? TextDecoder+Stream would be a
kind of such classes.

I also considered making read() operation to accept pre-allocated
ArrayBuffer and return the number of bytes written.

  stream.read(buffer)

If written data is insufficient, the user can continue to pass the same
buffer to fill the unused space. But, since DOMString is immutable, we
can't take the same approach for readText() op.

> see in Node), and complicates the internal mechanisms.  People think
>
they need it, but what they really need is readUntil(delimiterChar).

What if implementing length header based protocol, e.g. msgpack?

> 2. Reading strings vs ArrayBuffers or other types of things MUST be a
>
property of the stream,

Fixed property or mutable via readType attribute?

If readType, the sequence of UTF8/binary mixed read() problem remains.

> 3. Sync vs async read().  Let's dig into the issue of
> `var d = s.read()` vs `s.read(function(d) {})` for getting data out of
> a stream.
>
...snip...

> buffering to occur if you have pipe chains of streams that are
> processing at different speeds, where one is bursty and the other is
> consistent.
>

Clarification. You're saying that always posting cb to task queue is
wasteful. Right?

Anyway, I think it makes sense. If read is designed to invoke cb
synchronously, it'll be difficult to avoid stack overflow. So the only
options is to always run cb in the next task.

> stream.poll(function ondata() {
>

What happens if unshift() is called? poll() invokes ondata() only when new
data (unshift()-ed data is not included) is available?

>   var d = stream.read();
>   while (stream.state === 'OK') {
>     processData(d);
>     d = stream.read();
>   }
>

Is Jonas right about the reason why we need loop here? I.e. to avoid
automatic merge/serialization of buffered chunks?

>   switch (stream.state) {
>     case 'EOF': onend(); break;
>     case 'EWOULDBLOCK': stream.poll(ondata); break;
>     default: onerror(new Error('Stream read error: ' + stream.state));
>

We could distinguish these three states by null, empty
ArrayBuffer/DOMString, and non-empty ArrayBuffer/DOMString?

> ReadableStream.prototype.readAll = function(onerror, ondata, onend) {
>   onpoll();
>   function onpoll() {
>

If we decide not to allow multiple concurrent read operations on a stream,
can we just use event handler approach.

stream.onerror = ...
stream.ondata = ...

> 4. Passive data listening.  In Node v0.10, it is not possible to
> passively "listen" to the data passing through a stream without
> affecting the state of the stream.  This is corrected in v0.12, by
> making the read() method also emit a 'data' event whenever it returns
> data, so v0.8-style APIs work as they used to.
>
> The takeaway here is not to do what Node did, but to learn what Node
> learned: the passive-data-listening use-case is relevant.
>

What's the use case?

> 5. Piping.  It's important to consider how any proposed readable
> stream API will allow one to respond to backpressure, and how it
> relates to a *writable* stream API.  Data management from a source to
> a destination is the fundamental reason d'etre for streams, after all.
>

I'd have onwritable and onreadable handler, make their threshold
configurable and let pipe to setup them.

Received on Wednesday, 11 September 2013 05:29:27 UTC