- From: Jonas Sicking <jonas@sicking.cc>
- Date: Thu, 22 Aug 2013 00:28:48 -0700
- To: Isaac Schlueter <i@izs.me>
- Cc: Austin William Wright <aaa@bzfx.net>, Domenic Denicola <domenic@domenicdenicola.com>, Takeshi Yoshino <tyoshino@google.com>, "public-webapps@w3.org" <public-webapps@w3.org>
On Fri, Aug 9, 2013 at 12:47 PM, Isaac Schlueter <i@izs.me> wrote: > Jonas, > > What does *progress* mean here? > > So, you do something like this: > > var p = stream.read() > > to get a promise (of some sort). That read() operation is (if we're > talking about TCP or FS) a single operation. There's no "50% of the > way done reading" moment that you'd care to tap into. > > Even from a conceptual point of view, the data is either: > > a) available (and the promise is now fulfilled) > b) not yet available (and the promise is not yet fulfilled) > c) known to *never* be available because: > i) we've reached the end of the stream (and the promise is fulfilled > with some sort of EOF sentinel), or > ii) because an error happened (and the promise is broken). > > So.. where's the "progress"? A single read() operation seems like it > ought to be atomic to me, and indeed, the read[2] function either > returns some data (a), no data (c-i), raises EWOUDLBLOCK (b), or > raises some other error (c-ii). But, whichever of those it does, it > does right away. We only get woken up again (via > epoll/kqueue/CPIO/etc) once we know that the file descriptor (or > HANDLE in windows) is readable again (and thus, it's worthwhile to > attempt another read[2] operation). Hi Isaac, Sorry for taking so long to respond. It took me a while to understand where the disconnect came from. I also needed to mull over how a consumer actually is likely to consume data from a Stream. Having looked over the Node.js API more I think I see where the misunderstanding is coming from. The source of confusion is likely that Node.js and the proposal in [1] are very different. Specificially, in Node.js the read() operation is synchronous and operates on currently buffered data. In [1] the read() operation is asynchronous and isn't restricted to just the currently buffered data. >From my point of view there are two rough categories of ways of reading data from an asynchronous Stream: A) The Stream hands data to the consumer as soon as the data is available. I.e. the stream doesn't buffer data longer than until the next opportunity to fire a callback to the consumer. B) The Stream allows the consumer to pull data out of the stream at whatever pace, and in whatever block size, that the consumer finds appropriate. If the data isn't yet available a callback is used to notify the consumer when it is. A is basically the Stream pushing the data to the consumer. And B is the consumer pulling the data from the Stream. In Node.js doing A looks something like: stream.on('readable', function() { var buffer; while((buffer = stream.read())) { processData(buffer); } }); In the proposal in [1] you would do this with the following code stream.readBinaryChunked().ondata = function(e) { processData(e.data); } (side-note: it's unclear to me why the Node.js API forces readable.read() to be called in a loop. Is that to avoid having to flatten internal buffer fragments? Without that the two would essentially be the same with some minor syntactical differences) Here it definitely doesn't make sense to deliver progress notifications. Rather than delivering a progress notification to the consumer, you simply deliver the data. The way you would do B in Node.js looks something like: stream.on('readable', function() { var buffer; if ((buffer = stream.read(10))) { processTenBytes(buffer); } }); The same thing using the proposal in [1] looks like stream.readBinary(10).then(function(buffer) { processTenBytes(buffer); }); An important difference here is that in the Node.js API, the 'read 10 bytes' operation either immediately returns a result, or it immediately fails, depending on how much data we currently have buffered. I.e. the read() call is synchronous. The caller is expected to keep calling read(10) until the call succeeds. Though of course there's also a very useful callback which makes the calling again very easy. But between the calls to read() the Stream doesn't really have knowledge that someone is waiting to read 10 bytes of information. The API in [1] instead makes the read() call asynchronous. That means that we can always let the call succeed (unless there's an error on the stream of course). If we don't have enough data buffered currently, we simply call the success callback later than if we had had all requested data buffered already. This is the place where delivering progress notifications could also be done, though by no means this is an important aspect of the API. But since the read() operation is asynchronous, we can deliver progress notifications as we buffer up enough data to fulfill it. I hope that makes it more clear how progress notifications play in. So to be clear, progress notifications is by no means the important difference here. The important difference is whether we make read() be synchronous and operating on the current buffered data, or if we make it asynchronous and operating on the full data stream. As far as I can tell there is no difference capability-wise between the two API. I.e. both handle things like congestion equally well, and both handle both consumer-pulling and stream-pushing of data. The difference is only in syntax, though that doesn't make the differences any less important. Actually, the proposal in [1] is lacking the ability to unshift() data, but that's an obvious capability that we should add. I think on the surface the proposal in [1] makes things more convenient for the consumer. The consumer always gets a success call for each read(). The border between when data arrives into the stream is entirely transparent. And if we are moving to a world more based around promises for asynchronous operations then this fits very well there. However I think in practice the Node.js API might have several advantages. The main concern I have with the API in [1] is that there might be performance implications of returning to the event loop for every call to read(). Also, the fact that pull and push reading uses the same API is pretty cool. In general I suspect that most consumers don't actually know how many bytes that they want to consume from a stream. I would expect that many streams use formats with terminators rather than with fixed-length units of data. And so I would expect it to be a common pattern to guess at how many bytes can be consumer, then look at the data and consume as much as possible, and then use unshift() to put back any data that the consumer needs more data in order to consume. Does anyone have examples of code that uses the Node.js API? I'd love to look at how people practically end up consuming data? > Are you proposing that every step in the TCP dance is somehow exposed > on promise returned by read()? That seems rather inconvenient and > unnecessary, not to mention difficult to implement, since the TCP > stack is typically in kernel space. I'm not really sure how the TCP dance plays in here. But I definitely wasn't planning on exposing that. I hope that the description above makes it more clear how the [1] proposal works? [1] http://lists.w3.org/Archives/Public/public-webapps/2013AprJun/0727.html
Received on Thursday, 22 August 2013 07:29:53 UTC