Re: File API: Blob and underlying file changes. from Eric Uhrhane on 2010-01-21 (public-webapps@w3.org from January to March 2010)

From: Eric Uhrhane <ericu@google.com>
Date: Thu, 21 Jan 2010 11:22:44 -0800
To: Jian Li <jianli@chromium.org>
Cc: Dmitry Titov <dimich@chromium.org>, Jonas Sicking <jonas@sicking.cc>, Darin Fisher <darin@chromium.org>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <44b058fe1001211122y3a67b6b7n5444eca1e2a8c91b@mail.gmail.com>
On Thu, Jan 21, 2010 at 11:15 AM, Jian Li <jianli@chromium.org> wrote:
> Treating blobs as snapshots sounds like a reasonable approach and it will
> make the life of the chunked upload and other scenarios easier. Now the
> problem is: how do we get the blob (snapshot) out of the file?
> 1) We can still keep the current relationship between File and Blob. When we
> slice a file by calling File.slice, a new blob that captures the current
> file size and modification time is returned. The following Blob operations,
> like slice, will simply inherit the cached size and modification time. When
> we access the underlying file data in XHR.send() or FileReader, the
> modification time will be verified and an exception could be thrown.

This would require File.slice to do synchronous file IO, whereas
Blob.slice doesn't do that.

> 2) We can remove the inheritance of Blob from File and introduce
> File.getAsBlob() as dimich suggested. This seems to be more elegant.
> However, it requires changing the File API spec a lot.
>
> On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane <ericu@google.com> wrote:
>>
>> On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov <dimich@chromium.org> wrote:
>> > On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane <ericu@google.com> wrote:
>> >>
>> >> I think it could.  Here's a third option:
>> >> Make all blobs, file-based or not, just as async as the blobs in
>> >> option 2.  They never do sync IO, but could potentially fail future
>> >> read operations if their metadata is out of date [e.g. reading beyond
>> >> EOF].  However, expose the modification time on File via an async
>> >> method and allow the user to pass it in to a read call to enforce
>> >> "fail if changed since this time".  This keeps all file accesses
>> >> async, but still allows for chunked uploads without mixing files
>> >> accidentally.  If we allow users to refresh the modification time
>> >> asynchronously, it also allows for adding a file to a form, changing
>> >> the file on disk, and then uploading the new file.  The user would
>> >> look up the mod time when starting the upload, rather than when the
>> >> file's selected.
>> >
>> > It would be great to avoid sync file I/O on calls like Blob.size. They
>> > would
>> > simply return cached value. Actual mismatch would be detected during
>> > actual
>> > read operation.
>> > However then I'm not sure how to keep File derived from Blob, since:
>> > 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns
>> > current file size. The current spec says File is derived from Blob and
>> > Blob
>> > has Blob.size property that is likely going to co-exist with
>> > File.fileSize
>> > for a while, for compat reasons. It's weird for file.size and
>> > file.fileSize
>> > to return different things.
>>
>> True, but we'd probably want to deprecate file.fileSize anyway and
>> then get rid of it, since it's synchronous.
>>
>> > 2) Currently, xhr.send(file) does not fail and sends the version of the
>> > file
>> > that exists somewhere around xhr.send(file) call was issued. Since File
>> > is
>> > also a Blob, xhr.send(blob) would behave the same which means if we want
>> > to
>> > preserve this behavior the Blob can not fail async read operation if
>> > file
>> > has changed.
>> > There is a contradiction here. One way to resolve it would be to break
>> > "File
>> > is Blob" and to be able to "capture the File as Blob" by having
>> > file.getAsBlob(). The latter would make a snapshot of the state of the
>> > file,
>> > to be able to fail subsequent async read operations if the file has been
>> > changed.
>> > I've asked a few people around in a non-scientific poll and it seems
>> > developers expect Blob to be a 'snapshot', reflecting the state of the
>> > file
>> > (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob
>> > creation.
>> > Since it's obviously bad to actually copy data, it seems acceptable to
>> > capture enough information (like mod time) so the read operations later
>> > can
>> > fail if underlying storage has been changed. It feels really strange if
>> > reading the Blob can yield some data from one version of a file (or
>> > Canvas)
>> > mixed with some data from newer version, without any indication that
>> > this is
>> > happening.
>> > All that means there is an option 3:
>> > 3. Treat all Blobs as 'snapshots' that refer to the range of underlying
>> > data
>> > at the moment of creation of the Blob. Blobs produced further by
>> > Blob.slice() operation inherit the captured state w/o actually verifying
>> > it
>> > against 'live' underlying objects like files. All Blobs can be 'read'
>> > (or
>> > 'sent') via operations that can fail if the underlying content has
>> > changed.
>> > Optionally, expose snapshotTime property and perhaps "read if not
>> > changed
>> > since" parameter to read operations. Do not derive File from Blob,
>> > rather
>> > have File.getAsBlob() that produces a Blob which is a snapshot of the
>> > file
>> > at the moment of call. The advantage here is that it removes need for
>> > sync
>> > operations from Blob and provides mechanism to ensure the changing
>> > underlying storage is detectable. The disadvantage is a bit more
>> > complexity
>> > and bigger change to File spec.
>>
>> That sounds good to me.  If we're treating blobs as snapshots, I
>> retract my suggestion of the read-if-not-changed-since parameter.  All
>> reads after the data has changed should fail.  If you want to do a
>> chunked upload, don't snapshot your file into a blob until you're
>> ready to start.  Once you've done that, just slice off parts of the
>> blob, not the file.
>
>
Received on Thursday, 21 January 2010 19:23:35 UTC