Thoughts behind the Streams API ED from Takeshi Yoshino on 2013-11-04 (public-webapps@w3.org from October to December 2013)

From: Takeshi Yoshino <tyoshino@google.com>
Date: Tue, 5 Nov 2013 02:11:54 +0900
To: "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>
Message-ID: <CAH9hSJYekiKU7jWE4Wh_HWO0Bk1T0OnTrm_4LXR4Xo1zE=9PSQ@mail.gmail.com>
I'd like to summarize my ideas behind this API surface since the overlap
thread is too long. We'll put these into bug entries soon.

Feedback on "Overlap" thread, especially Issac's exhaustive list of
considerations and conversation with Aymeric were very helpful. In reply to
his mail, I drafted my initial proposal [2] in past which addresses almost
all of them. Since the API surface was so big, I tried to compact it while
incorporating Promises. Current ED [3] addresses not all but some of
important requirements. I think it's a good (re)starting point.

* Flow control
read() and write() in the ED does provide flow control by controlling the
timing of resolution of the returned Promise. A Stream would have a window
to limit data to be buffered in it. If a big value is passed as size
parameter of read(), it may extend the window if necessary.

In reading data as a DOMString, the size param of read() doesn't specify
exact raw size of data to be read out. It just works as throttle to prevent
internal buffer from being drained too fast. StreamReadResult tells how
many bytes were actually consumed.

If more explicit and precise flow control is necessary, we could cherry
pick some from my old big API proposal [1]. For example, making window size
configurable.

If it makes sense, size can be generalized to be cost of each element. It
would be useful when trying to generalize Stream to various objects.

To make window dynamically adjustable, we could introduce methods such as
drainCapacity(), expandCapacity() to it.

* Passive producer
Thinking of producers like a random number generator, it's not always good
to ask a producer to prepare data and push it to a Stream using write().
This was possible in [2], but not in the ED. This can be addressed for
example by adding one overload to write().
- write() and write(size) doesn't write data but wait until the Stream can
accept some amount or the specified size of data.

* Conversion from existing active and unstoppable producer API
E.g. WebSocket invokes onmessage immediately when new data is available.
For this kind of API, finite size Stream cannot absorb the production. So,
there'll be need to buffer read data manually. In [2], Stream always
accepted write() even if buffer is full assuming that if necessary the
producer should be using onwritable method.

Currently, only one write() can be issued concurrently, but we can choose
to have Stream queue write() requests in it.

* Sync read if possible
By adding sync flag to StreamReadResult and introducing StreamWriteResult
to signal if read was done sync (data is the actual result) or async (data
is a Promise) to save Promise post tasking cost.

I estimated that post tasking overhead should be negligible for bulk
reading, and when to read small fields, we often read some amount into
ArrayBuffer and then parse it directly. So it's currently excluded.

* Multiple consumers
pipe() can take multiple destination Streams. This allows for mirroring
Stream's output into two or more Streams. I also considered making Stream
itself consumable, but I thought it complicates API and implementation.
- Elements in internal buffer need to be reference counted.
- It must be able to distinguish consumers.

If one of consumers is fast and the other is slow, we need to wait for the
slower one before starting processing the rest in the original Stream. We
can choose to allow multiple consumers to address this by introducing a new
concept "Cursor" that represents reading context. Cursor can be implemented
as a new interface or Stream that refers to (and adds reference count to
elements of) the original Stream's internal buffer.

Needs some more study to figure out if context approach is really better
than pipe()-ing to new Stream instance.

* Splitting InputStream (ReadableStream) and OutputStream (WritableStream)
Writing part and reading part of the API can be split into two separate
interfaces. It's designed to allow for such decoupling. The constructor and
most of internal flags are to define a plain Stream. I'll give it a try
soon.

* StreamCosumeResult
I decided to have this interface for returning results since
- Notify EOF by just one read call if possible
- Tell the size of raw binary data consumed when readType="text"

* read(n)
There're roughly two ways to encode structured data, length header based
and separator based. For the former, people basically don't want to get
notified when enough data is not ready. It's also inconvenient if we get
small ArrayBuffer chunks and need to concatenate them manually. For the
latter, call read() or read(n) if you need flow control.

Small chunk problem is also bothering for separator based protocol.
unshift() or peek() may help.

* In-band success/fail signaling
I excluded abort() kind method. Any error signal and other additional
information are conveyed manually outside of Stream. We can revisit this
point. If it's turned to be essential, we could put abort() back or add an
argument to close() to accept one object for signaling.

* readEncoding, readType, type, etc.
It's not final. We can pursue to find better API for text reading.

[1] http://lists.w3.org/Archives/Public/public-webapps/2013JulSep/0355.html
[2] http://lists.w3.org/Archives/Public/public-webapps/2013JulSep/0481.html
[3] https://dvcs.w3.org/hg/streams-api/raw-file/tip/Overview.htm
Received on Monday, 4 November 2013 17:12:42 UTC