Re: File API: Blob and underlying file changes. from Jian Li on 2010-01-21 (public-webapps@w3.org from January to March 2010)

From: Jian Li <jianli@chromium.org>
Date: Thu, 21 Jan 2010 11:15:32 -0800
To: Eric Uhrhane <ericu@google.com>
Cc: Dmitry Titov <dimich@chromium.org>, Jonas Sicking <jonas@sicking.cc>, Darin Fisher <darin@chromium.org>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <a95818c31001211115p2e700db6k8d511d11799f2bd2@mail.gmail.com>
Treating blobs as snapshots sounds like a reasonable approach and it will
make the life of the chunked upload and other scenarios easier. Now the
problem is: how do we get the blob (snapshot) out of the file?

1) We can still keep the current relationship between File and Blob. When we
slice a file by calling File.slice, a new blob that captures the current
file size and modification time is returned. The following Blob operations,
like slice, will simply inherit the cached size and modification time. When
we access the underlying file data in XHR.send() or FileReader, the
modification time will be verified and an exception could be thrown.

2) We can remove the inheritance of Blob from File and introduce
File.getAsBlob() as dimich suggested. This seems to be more elegant.
However, it requires changing the File API spec a lot.


On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane <ericu@google.com> wrote:

> On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov <dimich@chromium.org> wrote:
> > On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane <ericu@google.com> wrote:
> >>
> >> I think it could.  Here's a third option:
> >> Make all blobs, file-based or not, just as async as the blobs in
> >> option 2.  They never do sync IO, but could potentially fail future
> >> read operations if their metadata is out of date [e.g. reading beyond
> >> EOF].  However, expose the modification time on File via an async
> >> method and allow the user to pass it in to a read call to enforce
> >> "fail if changed since this time".  This keeps all file accesses
> >> async, but still allows for chunked uploads without mixing files
> >> accidentally.  If we allow users to refresh the modification time
> >> asynchronously, it also allows for adding a file to a form, changing
> >> the file on disk, and then uploading the new file.  The user would
> >> look up the mod time when starting the upload, rather than when the
> >> file's selected.
> >
> > It would be great to avoid sync file I/O on calls like Blob.size. They
> would
> > simply return cached value. Actual mismatch would be detected during
> actual
> > read operation.
> > However then I'm not sure how to keep File derived from Blob, since:
> > 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns
> > current file size. The current spec says File is derived from Blob and
> Blob
> > has Blob.size property that is likely going to co-exist with
> File.fileSize
> > for a while, for compat reasons. It's weird for file.size and
> file.fileSize
> > to return different things.
>
> True, but we'd probably want to deprecate file.fileSize anyway and
> then get rid of it, since it's synchronous.
>
> > 2) Currently, xhr.send(file) does not fail and sends the version of the
> file
> > that exists somewhere around xhr.send(file) call was issued. Since File
> is
> > also a Blob, xhr.send(blob) would behave the same which means if we want
> to
> > preserve this behavior the Blob can not fail async read operation if file
> > has changed.
> > There is a contradiction here. One way to resolve it would be to break
> "File
> > is Blob" and to be able to "capture the File as Blob" by having
> > file.getAsBlob(). The latter would make a snapshot of the state of the
> file,
> > to be able to fail subsequent async read operations if the file has been
> > changed.
> > I've asked a few people around in a non-scientific poll and it seems
> > developers expect Blob to be a 'snapshot', reflecting the state of the
> file
> > (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob creation.
> > Since it's obviously bad to actually copy data, it seems acceptable to
> > capture enough information (like mod time) so the read operations later
> can
> > fail if underlying storage has been changed. It feels really strange if
> > reading the Blob can yield some data from one version of a file (or
> Canvas)
> > mixed with some data from newer version, without any indication that this
> is
> > happening.
> > All that means there is an option 3:
> > 3. Treat all Blobs as 'snapshots' that refer to the range of underlying
> data
> > at the moment of creation of the Blob. Blobs produced further by
> > Blob.slice() operation inherit the captured state w/o actually verifying
> it
> > against 'live' underlying objects like files. All Blobs can be 'read' (or
> > 'sent') via operations that can fail if the underlying content has
> changed.
> > Optionally, expose snapshotTime property and perhaps "read if not changed
> > since" parameter to read operations. Do not derive File from Blob, rather
> > have File.getAsBlob() that produces a Blob which is a snapshot of the
> file
> > at the moment of call. The advantage here is that it removes need for
> sync
> > operations from Blob and provides mechanism to ensure the changing
> > underlying storage is detectable. The disadvantage is a bit more
> complexity
> > and bigger change to File spec.
>
> That sounds good to me.  If we're treating blobs as snapshots, I
> retract my suggestion of the read-if-not-changed-since parameter.  All
> reads after the data has changed should fail.  If you want to do a
> chunked upload, don't snapshot your file into a blob until you're
> ready to start.  Once you've done that, just slice off parts of the
> blob, not the file.
>
Received on Thursday, 21 January 2010 19:16:07 UTC