Re: XMLHttpRequest.responseBlob from Eric Uhrhane on 2010-04-29 (public-webapps@w3.org from April to June 2010)

From: Eric Uhrhane <ericu@google.com>
Date: Thu, 29 Apr 2010 15:46:42 -0700
To: Darin Fisher <darin@chromium.org>
Cc: Michael Nordman <michaeln@google.com>, Jonas Sicking <jonas@sicking.cc>, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <w2o44b058fe1004291546l5bdb6d11pba22ef95669cdf02@mail.gmail.com>
On Thu, Apr 29, 2010 at 3:35 PM, Darin Fisher <darin@chromium.org> wrote:
> On Thu, Apr 29, 2010 at 3:24 PM, Eric Uhrhane <ericu@google.com> wrote:
>>
>> On Thu, Apr 29, 2010 at 3:04 PM, Darin Fisher <darin@chromium.org> wrote:
>> >
>> >
>> > On Wed, Apr 28, 2010 at 2:30 PM, Eric Uhrhane <ericu@google.com> wrote:
>> >>
>> >> On Wed, Apr 28, 2010 at 12:45 PM, Darin Fisher <darin@chromium.org>
>> >> wrote:
>> >> > On Wed, Apr 28, 2010 at 11:57 AM, Michael Nordman
>> >> > <michaeln@google.com>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> On Wed, Apr 28, 2010 at 11:21 AM, Jonas Sicking <jonas@sicking.cc>
>> >> >> wrote:
>> >> >>>
>> >> >>> Ugh, sent this originally to just Darin. Resending to the list.
>> >> >>>
>> >> >>> On Wed, Apr 28, 2010 at 10:11 AM, Darin Fisher <darin@chromium.org>
>> >> >>> wrote:
>> >> >>> > On Tue, Apr 27, 2010 at 2:04 PM, Jonas Sicking <jonas@sicking.cc>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> On Tue, Apr 27, 2010 at 1:59 PM, Darin Fisher
>> >> >>> >> <darin@chromium.org>
>> >> >>> >> wrote:
>> >> >>> >> > On Tue, Apr 27, 2010 at 1:33 PM, Jonas Sicking
>> >> >>> >> > <jonas@sicking.cc>
>> >> >>> >> > wrote:
>> >> >>> >> >>
>> >> >>> >> >> On Tue, Apr 27, 2010 at 1:26 PM, Darin Fisher
>> >> >>> >> >> <darin@chromium.org>
>> >> >>> >> >> wrote:
>> >> >>> >> >> >> It would be nice to be able to allow streaming such that
>> >> >>> >> >> >> every
>> >> >>> >> >> >> time
>> >> >>> >> >> >> a
>> >> >>> >> >> >> progress event is fired only the newly downloaded data is
>> >> >>> >> >> >> available.
>> >> >>> >> >> >> The UA is then free to throw away that data once the event
>> >> >>> >> >> >> is
>> >> >>> >> >> >> done
>> >> >>> >> >> >> firing. This would be useful in the cases when the page is
>> >> >>> >> >> >> able
>> >> >>> >> >> >> to
>> >> >>> >> >> >> do
>> >> >>> >> >> >> incremental parsing of the resulting document.
>> >> >>> >> >> >>
>> >> >>> >> >> >> If we add a 'load mode' flag on XMLHttpRequest, which
>> >> >>> >> >> >> can't
>> >> >>> >> >> >> be
>> >> >>> >> >> >> modified after send() is called, then streaming to a Blob
>> >> >>> >> >> >> could
>> >> >>> >> >> >> simply
>> >> >>> >> >> >> be another enum value for such a flag.
>> >> >>> >> >> >>
>> >> >>> >> >> >> There is still the problem of how the actual blob works.
>> >> >>> >> >> >> I.e.
>> >> >>> >> >> >> does
>> >> >>> >> >> >> .responseBlob return a new blob every time more data is
>> >> >>> >> >> >> returned? Or
>> >> >>> >> >> >> should the same Blob be constantly modifying? If
>> >> >>> >> >> >> modifying,
>> >> >>> >> >> >> what
>> >> >>> >> >> >> happens to any in-progress reads when the file is
>> >> >>> >> >> >> modified?
>> >> >>> >> >> >> Or
>> >> >>> >> >> >> do
>> >> >>> >> >> >> you
>> >> >>> >> >> >> just make the Blob available once the whole resource has
>> >> >>> >> >> >> been
>> >> >>> >> >> >> downloaded?
>> >> >>> >> >> >>
>> >> >>> >> >> >
>> >> >>> >> >> >
>> >> >>> >> >> > This is why I suggested using FileWriter.  FileWriter
>> >> >>> >> >> > already
>> >> >>> >> >> > has
>> >> >>> >> >> > to
>> >> >>> >> >> > deal with
>> >> >>> >> >> > most of the problems you mentioned above,
>> >> >>> >> >>
>> >> >>> >> >> Actually, as far as I can tell FileWriter is write-only so it
>> >> >>> >> >> doesn't
>> >> >>> >> >> deal with any of the problems above.
>> >> >>> >> >
>> >> >>> >> > When you use createWriter, you are creating a FileWriter to an
>> >> >>> >> > existing
>> >> >>> >> > File.
>> >> >>> >> > The user could attempt to create a FileReader to the very same
>> >> >>> >> > File
>> >> >>> >> > while
>> >> >>> >> > a FileWriter is open to it.
>> >> >>> >> > It is true that for <input type=saveas> there is no way to get
>> >> >>> >> > at
>> >> >>> >> > the
>> >> >>> >> > underlying
>> >> >>> >> > File object.  That is perhaps a good thing for the use case of
>> >> >>> >> > downloading
>> >> >>> >> > to
>> >> >>> >> > a location specified by the user.
>> >> >>> >>
>> >> >>> >> Ah. But as far as I can tell (and remember), it's still fairly
>> >> >>> >> undefined what happens when the OS file under a File/Blob object
>> >> >>> >> is
>> >> >>> >> mutated.
>> >> >>> >>
>> >> >>> >> / Jonas
>> >> >>> >
>> >> >>> > Agreed.  I don't see it as a big problem.  Do you?  The
>> >> >>> > application
>> >> >>> > developer is
>> >> >>> > in control.  They get to specify the output file (via FileWriter)
>> >> >>> > that
>> >> >>> > XHR
>> >> >>> > sends its
>> >> >>> > output to, and they get to know when XHR is done writing.  So,
>> >> >>> > the
>> >> >>> > application
>> >> >>> > developer can avoid reading from the file until XHR is done
>> >> >>> > writing.
>> >> >>>
>> >> >>> Well, it seems like a bigger deal here since the file is being
>> >> >>> constantly modified as we're downloading data into it, no? So for
>> >> >>> example if you grab a File object after the first progress event,
>> >> >>> what
>> >> >>> does that File object contain after the second? Does it contain the
>> >> >>> whole file, including the newly downloaded data? Or does it contain
>> >> >>> only the data after the first progress event? Or is the File object
>> >> >>> now invalid and can't be used?
>> >> >>
>> >> >> What gears did about that was to provide a 'snapshot' of the
>> >> >> downloaded data each time responseBlob was called, with
>> >> >> the 'snapshot' being consistent with the progress events
>> >> >> having been seen by the caller. The 'snapshot' would remain
>> >> >> valid until discarded by the caller. Each snapshot just provided
>> >> >> a view onto the same data which maybe was in memory or
>> >> >> maybe had spilled over to disk unbeknownst to the caller.
>> >> >>
>> >> >>>
>> >> >>> I'm also still unsure that a FileWriter is what you want generally.
>> >> >>> If
>> >> >>> you're just downloading temporary data, but data that happens to be
>> >> >>> so
>> >> >>> large that you don't want to keep it in memory, you don't want to
>> >> >>> bother the user asking for a location for that temporary file. Nor
>> >> >>> do
>> >> >>> you want that file to be around once the user leaves the page.
>> >> >>
>> >> >>
>> >> >> I think the point about not requiring the caller to manage the
>> >> >> 'file'
>> >> >> are
>> >> >> important.
>> >> >>
>> >> >>>
>> >> >>> Sure, if the use case is actually downloading and saving a file for
>> >> >>> the user to use, rather than for the page to use, then a FileWriter
>> >> >>> seems like it would work. I.e. if you want something like
>> >> >>> "Content-Disposition: attachment", but where you can specify
>> >> >>> request
>> >> >>> headers. Is that the use case?
>> >> >>
>> >> >> Mods to xhr to access the response more opaquely is a fairly general
>> >> >> feature request. One specific use case is to download a resource via
>> >> >> xhr
>> >> >> and then save the results in a sandboxed file system. So "for the
>> >> >> page
>> >> >> to
>> >> >> use".
>> >> >
>> >> > ^^^ That is the use case I'm primarily interested in.
>> >>
>> >> I think there are a couple of important use cases here, and FileWriter
>> >> really only works for one of them.  It would work fine for a sandboxed
>> >> filesystem, as you say.  However, if you just want to get a chunk of
>> >> binary data from the server, and don't want to manage its lifetime [or
>> >> don't have permission to use the filesystem API, or are on a browser
>> >> that doesn't support it], this won't work.
>> >
>> > My thinking was that we would still have the responseBody getter that
>> > makes available a ByteArray object.
>> >
>> >>
>> >> If we just present a File or Blob to the user, they can get access to
>> >> the data without worrying about where it's stored, whether it's in
>> >> memory or on disk, and without having to clean it up or get any kind
>> >> of permission.  If they want to copy it into their sandboxed
>> >> filesystem, they can do that using the filesystem API.
>> >
>> > That copy step seems suboptimal for large files.  Can we eliminate it?
>>
>> In case 1, the developer just wants the data, and doesn't want to
>> manage it or use the sandboxed FileSystem [1].  It's stored
>> [temporarily] in some place controlled by the browser, that the app
>> can't freely browse.
>> In case 2, the developer wants to keep the data around in the
>> FileSystem indefinitely.  Once it's there, it can be opened at will.
>> We want to get it there without an extra copy.
>>
>> Giving a FileWriter [2] to XHR doesn't handle case 1, since while that
>> will store the data for you, it doesn't give you read access.  The
>> closest read-write primitive is FileEntry from FileSystem.  If you
>> grab a FileEntry from your sandbox and give it to your XHR, that would
>> work for case 2.  For case 1, we'd need something like mkTemp [3] that
>> would create a FileEntry pointing at the downloaded file.  That seems
>> a bit kludgy.
>>
>> If XHR has a File property that you can ask for, you could either
>> supply it a FileEntry before sending [in which case the File you got
>> back would be that FileEntry] or it could give you a new File that
>> points into the browser cache if you don't.  How does that sound?
>>
>>     Eric
>>
>> [1] http://dev.w3.org/2009/dap/file-system/file-dir-sys.html
>> [2] http://dev.w3.org/2009/dap/file-system/file-writer.html
>> [3] http://unixhelp.ed.ac.uk/CGI/man-cgi?mktemp
>
> It just seems to me that use case #1 can just be regarded as a
> specialization
> of use case #2.  Given the File API's requestTemporaryFilesystem, it would
> be
> easy for the application to request that the file be stored in temporary
> space.
> They don't have to manage their temporary space, right?
> The details of using the sandboxed filesystem for use case #1 could be
> hidden
> within a JS library thereby making it an easy solution to deploy.

Yeah, that would work.  It's a little more cumbersome, but as you say,
a library could clean that right up.

Whether that's going to make a mess or not depends on the specific
implementation of requestTemporaryFilesystem.  If it always gives back
the same filesystem, such that it can be used for caching [the likely
case], then that library is going to be dropping files into a
namespace the developer's using.  But a smart library will do that
neatly and in a way that is easy to clean up.

>> >> > I think it would be beneficial if downloading to disk was not
>> >> > rate-limited
>> >> > by routing chunks through JS.
>> >>
>> >> +1, although the streaming API might be a nice addition.
>> >>
>> >> > I don't care as much about downloading to a user specified location,
>> >> > but
>> >> > I
>> >> > think that's an interesting use case as well.  XHR gives the app more
>> >> > flexibility (custom headers, cross-origin, etc.), and FileWriter
>> >> > allows
>> >> > the
>> >> > app to "save as" URLs that do not have a C-D header that forces a
>> >> > download.
>> >> >
>> >> >>
>> >> >> The notion of having a streaming interface on xhr is interesting.
>> >> >> That
>> >> >> with a
>> >> >> BlobBuilder capability could work. If a streaming xhr mode provided
>> >> >> new
>> >> >> data in the form of 'blobs' where each blob was just the newly
>> >> >> received
>> >> >> data,
>> >> >> the caller could use a BlobBuilder instance to concatenate the set
>> >> >> of
>> >> >> received
>> >> >> data blobs. And then take blobBuilder.getBlob() and do what they
>> >> >> will
>> >> >> with
>> >> >> it.   xhr.ondatareceived = function (data) {
>> >> >> builder.appendBlob(data);
>> >> >> }
>> >> >
>> >> > ^^^ I like that proposal for streaming.
>> >> > -Darin
>> >
>> >
>
>
Received on Thursday, 29 April 2010 22:47:30 UTC