Re: Streams and Blobs

On Tue, Feb 26, 2013 at 2:56 AM, Anne van Kesteren <annevk@annevk.nl> wrote:
> So currently Mozilla has these extensions to XMLHttpRequest:
>
>  * moz-blob
>  * moz-chunked-text
>  * moz-chunked-arraybuffer
>
> The first offers incremental read. The latter two offer chunked read
> (data can be discarded as soon as it's read).
>
> There's also Microsoft's Streams API which I added to the
> XMLHttpRequest draft at some point. SreamReader offers incremental
> read, but only from the beginning of the stream, which makes it
> nothing more than a Blob which can grow in size over time.
>
> The advantage the Streams API seems to have over moz-blob is that you
> do not need to create a new object to read from each time there's
> fresh data. The disadvantage is that that's only a minor advantage and
> there's a whole lot of new API that comes with it.
>
> Did I miss something?
>
> I'm kinda leaning towards adding incremental Blob and chunked
> ArrayBuffer support and removing the Streams API. I can see use for
> Stream construction going forward to generate a request entity body
> that increases over time, but nobody is there yet.

I think the API that Gecko is exposing for "streams" on XHR is a good
start for a feature set. However the problem is that the API that we
have marries the concept of streaming data directly with the XHR
object. I.e. if we want to enable an API accepts streaming data, say
something FileWriter-like, with Gecko's current design this API would
have to take an XHR object. I.e. there would have to be something like
FileWriter.write(myXHR).

This seems awkward and not very future proof. Surely other things than
HTTP requests can generate streams of data. For example the TCPSocket
API seems like a good candidate of something that can generate a
stream. Likewise WebSocket could do the same for large message frames.

Other potential sources of data streams is obviously local file data,
and simply generating content using javascript.

So I think it makes a lot more sense to create the concept of Stream
separate from the XHR object. A good start for what to expose on the
Stream object is likely the three extensions you list above, in
addition to simply receiving the full stream contents as a Blob (and
maybe ArrayBuffer). Though I would expect that list to change pretty
quickly once we start looking at it.

An important difference between a stream and a blob is that you can
read the contents of a Blob multiple times, while a stream is
optimized for lower resource use by throwing out data as soon as it
has been consumed. I think both are needed in the web platform.

But this difference is important to consider with regards to
connecting a Stream to a <video> or <audio> element. Users generally
expect to be able to rewind a media element which would mean that if
you can connect a Stream to a media element, the element would need to
buffer the full stream.

But this isn't an issue that we need to tackle right now. What I think
the first thing to do is is to create a Stream primitive, figure out
what API would go directly on it, or what new interfaces needs to be
created to allow getting the data out of it, and a way to get XHR to
produce such a primitive.

> (Also, in retrospect, I think we should have made Blob's be able to
> increase in size over time and not have the synchronous size
> getter...)

This is something that I go back and forth on a lot. I.e. if it was a
mistake to make Blob.size synchronous or not. It's certainly adding a
lot of implementation complexity, and I'm not sure how much it
benefits authors.

But I think this ship has sailed. And I also think that it's an
orthogonal question to Streams. Either way I think we need the Stream
primitive in order to model a data stream that can only be consumed
once.

> I've also heard requests for give me the last bit of data that was
> transferred (rather than data since last read), for real-time audio. I
> think we should probably leave that use case to WebRTC.

Agreed.

/ Jonas

Received on Thursday, 7 March 2013 09:38:01 UTC