Re: XMLHttpRequest.responseBlob from Eric Uhrhane on 2010-04-28 (public-webapps@w3.org from April to June 2010)

From: Eric Uhrhane <ericu@google.com>
Date: Wed, 28 Apr 2010 14:30:19 -0700
To: Darin Fisher <darin@chromium.org>
Cc: Michael Nordman <michaeln@google.com>, Jonas Sicking <jonas@sicking.cc>, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <p2y44b058fe1004281430re49c7ed7s1b8321232b529d83@mail.gmail.com>
On Wed, Apr 28, 2010 at 12:45 PM, Darin Fisher <darin@chromium.org> wrote:
> On Wed, Apr 28, 2010 at 11:57 AM, Michael Nordman <michaeln@google.com>
> wrote:
>>
>>
>> On Wed, Apr 28, 2010 at 11:21 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>
>>> Ugh, sent this originally to just Darin. Resending to the list.
>>>
>>> On Wed, Apr 28, 2010 at 10:11 AM, Darin Fisher <darin@chromium.org>
>>> wrote:
>>> > On Tue, Apr 27, 2010 at 2:04 PM, Jonas Sicking <jonas@sicking.cc>
>>> > wrote:
>>> >>
>>> >> On Tue, Apr 27, 2010 at 1:59 PM, Darin Fisher <darin@chromium.org>
>>> >> wrote:
>>> >> > On Tue, Apr 27, 2010 at 1:33 PM, Jonas Sicking <jonas@sicking.cc>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Tue, Apr 27, 2010 at 1:26 PM, Darin Fisher <darin@chromium.org>
>>> >> >> wrote:
>>> >> >> >> It would be nice to be able to allow streaming such that every
>>> >> >> >> time
>>> >> >> >> a
>>> >> >> >> progress event is fired only the newly downloaded data is
>>> >> >> >> available.
>>> >> >> >> The UA is then free to throw away that data once the event is
>>> >> >> >> done
>>> >> >> >> firing. This would be useful in the cases when the page is able
>>> >> >> >> to
>>> >> >> >> do
>>> >> >> >> incremental parsing of the resulting document.
>>> >> >> >>
>>> >> >> >> If we add a 'load mode' flag on XMLHttpRequest, which can't be
>>> >> >> >> modified after send() is called, then streaming to a Blob could
>>> >> >> >> simply
>>> >> >> >> be another enum value for such a flag.
>>> >> >> >>
>>> >> >> >> There is still the problem of how the actual blob works. I.e.
>>> >> >> >> does
>>> >> >> >> .responseBlob return a new blob every time more data is
>>> >> >> >> returned? Or
>>> >> >> >> should the same Blob be constantly modifying? If modifying, what
>>> >> >> >> happens to any in-progress reads when the file is modified? Or
>>> >> >> >> do
>>> >> >> >> you
>>> >> >> >> just make the Blob available once the whole resource has been
>>> >> >> >> downloaded?
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > This is why I suggested using FileWriter.  FileWriter already has
>>> >> >> > to
>>> >> >> > deal with
>>> >> >> > most of the problems you mentioned above,
>>> >> >>
>>> >> >> Actually, as far as I can tell FileWriter is write-only so it
>>> >> >> doesn't
>>> >> >> deal with any of the problems above.
>>> >> >
>>> >> > When you use createWriter, you are creating a FileWriter to an
>>> >> > existing
>>> >> > File.
>>> >> > The user could attempt to create a FileReader to the very same File
>>> >> > while
>>> >> > a FileWriter is open to it.
>>> >> > It is true that for <input type=saveas> there is no way to get at
>>> >> > the
>>> >> > underlying
>>> >> > File object.  That is perhaps a good thing for the use case of
>>> >> > downloading
>>> >> > to
>>> >> > a location specified by the user.
>>> >>
>>> >> Ah. But as far as I can tell (and remember), it's still fairly
>>> >> undefined what happens when the OS file under a File/Blob object is
>>> >> mutated.
>>> >>
>>> >> / Jonas
>>> >
>>> > Agreed.  I don't see it as a big problem.  Do you?  The application
>>> > developer is
>>> > in control.  They get to specify the output file (via FileWriter) that
>>> > XHR
>>> > sends its
>>> > output to, and they get to know when XHR is done writing.  So, the
>>> > application
>>> > developer can avoid reading from the file until XHR is done writing.
>>>
>>> Well, it seems like a bigger deal here since the file is being
>>> constantly modified as we're downloading data into it, no? So for
>>> example if you grab a File object after the first progress event, what
>>> does that File object contain after the second? Does it contain the
>>> whole file, including the newly downloaded data? Or does it contain
>>> only the data after the first progress event? Or is the File object
>>> now invalid and can't be used?
>>
>> What gears did about that was to provide a 'snapshot' of the
>> downloaded data each time responseBlob was called, with
>> the 'snapshot' being consistent with the progress events
>> having been seen by the caller. The 'snapshot' would remain
>> valid until discarded by the caller. Each snapshot just provided
>> a view onto the same data which maybe was in memory or
>> maybe had spilled over to disk unbeknownst to the caller.
>>
>>>
>>> I'm also still unsure that a FileWriter is what you want generally. If
>>> you're just downloading temporary data, but data that happens to be so
>>> large that you don't want to keep it in memory, you don't want to
>>> bother the user asking for a location for that temporary file. Nor do
>>> you want that file to be around once the user leaves the page.
>>
>>
>> I think the point about not requiring the caller to manage the 'file' are
>> important.
>>
>>>
>>> Sure, if the use case is actually downloading and saving a file for
>>> the user to use, rather than for the page to use, then a FileWriter
>>> seems like it would work. I.e. if you want something like
>>> "Content-Disposition: attachment", but where you can specify request
>>> headers. Is that the use case?
>>
>> Mods to xhr to access the response more opaquely is a fairly general
>> feature request. One specific use case is to download a resource via xhr
>> and then save the results in a sandboxed file system. So "for the page to
>> use".
>
> ^^^ That is the use case I'm primarily interested in.

I think there are a couple of important use cases here, and FileWriter
really only works for one of them.  It would work fine for a sandboxed
filesystem, as you say.  However, if you just want to get a chunk of
binary data from the server, and don't want to manage its lifetime [or
don't have permission to use the filesystem API, or are on a browser
that doesn't support it], this won't work.

If we just present a File or Blob to the user, they can get access to
the data without worrying about where it's stored, whether it's in
memory or on disk, and without having to clean it up or get any kind
of permission.  If they want to copy it into their sandboxed
filesystem, they can do that using the filesystem API.

> I think it would be beneficial if downloading to disk was not rate-limited
> by routing chunks through JS.

+1, although the streaming API might be a nice addition.

> I don't care as much about downloading to a user specified location, but I
> think that's an interesting use case as well.  XHR gives the app more
> flexibility (custom headers, cross-origin, etc.), and FileWriter allows the
> app to "save as" URLs that do not have a C-D header that forces a download.
>
>>
>> The notion of having a streaming interface on xhr is interesting. That
>> with a
>> BlobBuilder capability could work. If a streaming xhr mode provided new
>> data in the form of 'blobs' where each blob was just the newly received
>> data,
>> the caller could use a BlobBuilder instance to concatenate the set of
>> received
>> data blobs. And then take blobBuilder.getBlob() and do what they will with
>> it.   xhr.ondatareceived = function (data) { builder.appendBlob(data); }
>
> ^^^ I like that proposal for streaming.
> -Darin
Received on Wednesday, 28 April 2010 21:31:05 UTC