Re: Thoughts behind the Streams API ED from Aymeric Vitte on 2013-11-06 (public-webapps@w3.org from October to December 2013)

From: Aymeric Vitte <vitteaymeric@gmail.com>
Date: Wed, 06 Nov 2013 11:33:09 +0100
To: Takeshi Yoshino <tyoshino@google.com>, "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>
Message-ID: <527A1AE5.5090705@gmail.com>
I have seen the different bugs too, some comments:

- maybe I have missed some explaination or some obvious thing but I 
don't understand very well right now the difference/use between 
readable/writablebytestream and bytestream

- pause/unpause: as far as I understand the whatwg spec does not 
recommend it but I don't understand the reasons. As I previously 
mentionned the idea is to INSERT a pause signal in the stream, you can 
not control the stream and therefore know when you are pausing it.

- stop/resume: same, see my previous post, the difference is that the 
consuming API should clone the state of the operation and close the 
current operation as if eof was received, then restart from the clone on 
resume

- pipe []/fork: I don't see why the fast stream should wait for the slow 
one, so maybe the stream is forked and pause can be used for the slow one

- flow control: could it be possible to advertise a maximum bandwidth 
rate for a stream?

Regards

Aymeric

Le 04/11/2013 18:11, Takeshi Yoshino a écrit :
> I'd like to summarize my ideas behind this API surface since the 
> overlap thread is too long. We'll put these into bug entries soon.
>
> Feedback on "Overlap" thread, especially Issac's exhaustive list of 
> considerations and conversation with Aymeric were very helpful. In 
> reply to his mail, I drafted my initial proposal [2] in past which 
> addresses almost all of them. Since the API surface was so big, I 
> tried to compact it while incorporating Promises. Current ED [3] 
> addresses not all but some of important requirements. I think it's a 
> good (re)starting point.
>
> * Flow control
> read() and write() in the ED does provide flow control by controlling 
> the timing of resolution of the returned Promise. A Stream would have 
> a window to limit data to be buffered in it. If a big value is passed 
> as size parameter of read(), it may extend the window if necessary.
>
> In reading data as a DOMString, the size param of read() doesn't 
> specify exact raw size of data to be read out. It just works as 
> throttle to prevent internal buffer from being drained too fast. 
> StreamReadResult tells how many bytes were actually consumed.
>
> If more explicit and precise flow control is necessary, we could 
> cherry pick some from my old big API proposal [1]. For example, making 
> window size configurable.
>
> If it makes sense, size can be generalized to be cost of each element. 
> It would be useful when trying to generalize Stream to various objects.
>
> To make window dynamically adjustable, we could introduce methods such 
> as drainCapacity(), expandCapacity() to it.
>
> * Passive producer
> Thinking of producers like a random number generator, it's not always 
> good to ask a producer to prepare data and push it to a Stream using 
> write(). This was possible in [2], but not in the ED. This can be 
> addressed for example by adding one overload to write().
> - write() and write(size) doesn't write data but wait until the Stream 
> can accept some amount or the specified size of data.
>
> * Conversion from existing active and unstoppable producer API
> E.g. WebSocket invokes onmessage immediately when new data is 
> available. For this kind of API, finite size Stream cannot absorb the 
> production. So, there'll be need to buffer read data manually. In [2], 
> Stream always accepted write() even if buffer is full assuming that if 
> necessary the producer should be using onwritable method.
>
> Currently, only one write() can be issued concurrently, but we can 
> choose to have Stream queue write() requests in it.
>
> * Sync read if possible
> By adding sync flag to StreamReadResult and introducing 
> StreamWriteResult to signal if read was done sync (data is the actual 
> result) or async (data is a Promise) to save Promise post tasking cost.
>
> I estimated that post tasking overhead should be negligible for bulk 
> reading, and when to read small fields, we often read some amount into 
> ArrayBuffer and then parse it directly. So it's currently excluded.
>
> * Multiple consumers
> pipe() can take multiple destination Streams. This allows for 
> mirroring Stream's output into two or more Streams. I also considered 
> making Stream itself consumable, but I thought it complicates API and 
> implementation.
> - Elements in internal buffer need to be reference counted.
> - It must be able to distinguish consumers.
>
> If one of consumers is fast and the other is slow, we need to wait for 
> the slower one before starting processing the rest in the original 
> Stream. We can choose to allow multiple consumers to address this by 
> introducing a new concept "Cursor" that represents reading context. 
> Cursor can be implemented as a new interface or Stream that refers to 
> (and adds reference count to elements of) the original Stream's 
> internal buffer.
>
> Needs some more study to figure out if context approach is really 
> better than pipe()-ing to new Stream instance.
>
> * Splitting InputStream (ReadableStream) and OutputStream (WritableStream)
> Writing part and reading part of the API can be split into two 
> separate interfaces. It's designed to allow for such decoupling. The 
> constructor and most of internal flags are to define a plain Stream. 
> I'll give it a try soon.
>
> * StreamCosumeResult
> I decided to have this interface for returning results since
> - Notify EOF by just one read call if possible
> - Tell the size of raw binary data consumed when readType="text"
>
> * read(n)
> There're roughly two ways to encode structured data, length header 
> based and separator based. For the former, people basically don't want 
> to get notified when enough data is not ready. It's also inconvenient 
> if we get small ArrayBuffer chunks and need to concatenate them 
> manually. For the latter, call read() or read(n) if you need flow control.
>
> Small chunk problem is also bothering for separator based protocol. 
> unshift() or peek() may help.
>
> * In-band success/fail signaling
> I excluded abort() kind method. Any error signal and other additional 
> information are conveyed manually outside of Stream. We can revisit 
> this point. If it's turned to be essential, we could put abort() back 
> or add an argument to close() to accept one object for signaling.
>
> * readEncoding, readType, type, etc.
> It's not final. We can pursue to find better API for text reading.
>
> [1] 
> http://lists.w3.org/Archives/Public/public-webapps/2013JulSep/0355.html
> [2] 
> http://lists.w3.org/Archives/Public/public-webapps/2013JulSep/0481.html
> [3] https://dvcs.w3.org/hg/streams-api/raw-file/tip/Overview.htm

-- 
Peersm : http://www.peersm.com
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
Received on Wednesday, 6 November 2013 10:34:06 UTC