Re: File API: reading a Blob from Arun Ranganathan on 2014-07-17 (public-webapps@w3.org from July to September 2014)

From: Arun Ranganathan <arun@mozilla.com>
Date: Thu, 17 Jul 2014 08:58:39 -0400
To: Anne van Kesteren <annevk@annevk.nl>
Cc: Web Applications Working Group WG <public-webapps@w3.org>, Domenic Denicola <domenic@domenicdenicola.com>, Kyle Huey <me@kylehuey.com>
Message-Id: <492DE117-FD9B-41A1-8393-CBF1BA866693@mozilla.com>

On Jul 14, 2014, at 3:47 AM, Anne van Kesteren <annevk@annevk.nl> wrote:

> On Thu, Jul 10, 2014 at 7:05 PM, Arun Ranganathan <arun@mozilla.com> wrote:
>> On Jul 3, 2014, at 10:50 AM, Anne van Kesteren <annevk@annevk.nl> wrote:
>>> That would mean you would get different results between using
>>> FileReaderSync and XMLHttpRequest. That does not seem ideal
>> 
>> The implementation train has already left the station on this. The movitation of an “ideal" match-up with XMLHttpRequest doesn’t seem strong enough to revisit this by filing browser bugs across implementations (but Cc’ing K. Huey also).
> 
> Well, surely if we support both, we'd like them to work in the same
> way so they can share the same underlying abstraction.

There are two questions:

1. How should FileReaderSync behave, to solve the majority of use cases?
2. What is a useful underlying abstraction for spec. authors that can be reused in present APIs like Fetch and future APIs?

I don’t think it is necessary to mix the two questions for APIs that are already shipping. Do you think that FileReaderSync AND FileReader should support partial Blob data in read results? Or that stream-based reads should do this with a different API?

> 
>> We agreed some time ago to not have partial data.
> 
> Pointer? I also don't really see how that makes sense given how
> asynchronous read would perform.

Well, the bug that removed them is: https://www.w3.org/Bugs/Public/show_bug.cgi?id=23158 and dates to last year.

Problems really include decoding strings according to the encoding determination for incomplete Blobs:

http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0063.html

Another thread covered deltas in progress events:

http://lists.w3.org/Archives/Public/public-webapps/2013JanMar/0069.html

I don’t have pointers to IRC conversations, but:

1. Decoding was an issue with *readAsText*. I suppose we could make that method alone be all or nothing.

2. Use cases. MOST reads are useful with all data, after which you could use Blob manipulation (e.g. *slice*). New result objects each time didn’t seem optimal. And, it was something punted to Streams since that seemed like a longer term direction. There was also the idea of a Promise-based File API that could be consumed by the FileSystem API.

But it might be useful to have an abstract read that helps Fetch and other things like Streams also:

> Yeah, I now think that we want something even lower-level and build
> the task queuing primitive on top of that. (Basically by observing the
> stream that is being read and queuing tasks as data comes in, similar
> to Fetch. The synchronous case would just wait for the stream to
> complete.

If I understand you correctly, you mean something that might be two-part (some hand waving below, but …):

 To read a Blob object /blob/, run these steps:

 1. Let /s/ be a new buffer. 

 2. Return /s/, but continue running these steps asynchronously.

 3. While /blob/'s underlying data stream is not closed, run these
    substeps:

    1. Let /bytes/ be the result of reading a chunk from /blob/'s
       underlying data.

    2. If /bytes/ is not failure, push /bytes/ to /s/ and set
       /s/'s transmitted to /bytes/'s length.

    3. Otherwise, signal some kind of error to /s/ and terminate
       these steps.

AND

To read a Blob object with tasks:

1. Run the read a Blob algorithm above.
2. When reading the first /bytes/ queue a task called process read.
3. When pushing /bytes/ to /s/, queue a task called process read data.
4. When all /bytes/ are pushed to /s/ queue a task called process read EOF.
5. If an error condition is signaled queue a task called process error with a failure reason.

Is “chunk” implementation defined? Right now we assume 1 byte or 50ms. “Chunk” seems a bit hand-wavy and hard to enforce, but… it might be the right approach.

— A*

Received on Thursday, 17 July 2014 12:59:12 UTC