Re: [XHR] support for streaming data from Charles Pritchard on 2011-08-09 (public-webapps@w3.org from July to September 2011)

From: Charles Pritchard <chuck@jumis.com>
Date: Mon, 08 Aug 2011 19:18:59 -0700
To: Jonas Sicking <jonas@sicking.cc>
CC: Webapps WG <public-webapps@w3.org>
Message-ID: <4E409913.3010603@jumis.com>
On 8/8/2011 5:59 PM, Jonas Sicking wrote:
> On Mon, Aug 8, 2011 at 5:48 PM, Charles Pritchard<chuck@jumis.com>  wrote:
>> On 8/8/2011 5:13 PM, Jonas Sicking wrote:
>>> Hi All,
>>>
>>> XHR Level 2 does wonders for making XMLHttpRequest better. However
>>> there is one problem that we have run into with "streaming" data.
>> ...
>> Agreed. I proposed something similar in January, with fixed buffer lengths:
>> http://lists.w3.org/Archives/Public/public-webapps/2011JanMar/0304.html
>>
>> Fixed buffers are somewhat common with more data intense network processing.
>> They may trigger quite a few more progress events, but they guarantee an
>> upper
>> size in memory usage / array length.
>> ...
>>> Same thing when .responseType is set to "streaming-arraybuffer". In
>>> this case .response is set to an ArrayBuffer containing the data
>>> received since the last "progress" event.
>>>
>>> There is one non-ideal thing with this solution. Once the last chunk
>>> of data has arrived, at least Firefox doesn't fire a "progress" event,
>>> but instead just a "load" event. This means that people will have to
>>> consume data both from the "progress" event and from the "load" event.
>>>
>>> Another solution would to make sure to always fire a "progress" event
>>> for the last data before firing the "load" event. I personally like
>>> this approach more. There *might* even be reasons to do that to ensure
>>> that pages create progress bars that reliably reach 100% etc.
>>>
>> I agree to this, too. For a stream, load may be the same thing as stop, and
>> not have result data.
>>
>> Anne suggested using EventSource and other WebSockets style semantics
>> instead of overloading
>> XHR2.
>> http://lists.w3.org/Archives/Public/public-webapps/2011JanMar/0314.html
>> http://lists.w3.org/Archives/Public/public-webapps/2011JanMar/0375.html
>>
>> I'd be happy-enough just having a streaming binary buffer.
> Unless EventSource is dramatically changed, it won't solve the use
> cases here. One use case that I've brought up several times in the
> past is incremental loading of Word documents. A more current example
> would be loading a pdf document and rendering it incrementally as data
> is coming in.
>
> Neither of these cases is even close to how EventSource currently
> works, so I don't think it's a good fit.
>

Incremental rendering of formats that support it, is certainly a good case.

PDF itself, is more complex than Word:
Supporting PDF viewer requires HTTP Content-Range to handle offsets in 
large streams.

For efficient PDF support, I'd want to work with something like this:
  e.response.length == size of data in bytes (array buffer may be larger)
  e.response.data == array buffer
  e.response.seek() - method to seek ahead N-bytes before next progress 
event.

Here's a related on File API Streaming Blobs -- this is a valid use case 
other than crypto, I've posted as su:
http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/0725.html

I believe that GPAC seeks through large SVG files via offsets and small 
buffers, from what I understood at SVG F2F.
http://gpac.wp.institut-telecom.fr/
The technique is similar to what PDF has in it's spec. SVG does not have 
byte offset hints, but GPAC expects
data to be processed by an authoring tool and otherwise works with 
transcoding, much as VLC (VideoLan) does.

It seems that a buffer less than 16K could be advantageous on Linux:
http://blog.superpat.com/2010/06/01/zero-copy-in-linux-with-sendfile-and-splice/
That'd be helpful for local file viewing, if nothing else.

For some authors, it may be helpful to specify the length they're 
looking to have filled at any one time.

-Charles
Received on Tuesday, 9 August 2011 02:19:36 UTC