Re: File API: Blob and underlying file changes. from Michael Nordman on 2010-01-21 (public-webapps@w3.org from January to March 2010)

From: Michael Nordman <michaeln@google.com>
Date: Thu, 21 Jan 2010 13:15:47 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Eric Uhrhane <ericu@google.com>, Jian Li <jianli@chromium.org>, Dmitry Titov <dimich@chromium.org>, Darin Fisher <darin@chromium.org>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <fa2eab051001211315p5f4cf5a0y68205014f9b8aeef@mail.gmail.com>
On Thu, Jan 21, 2010 at 12:49 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> One thing to remember here is that if we require snapshotting, that
> will mean paying potentially very high costs every time the
> snapshotting operation is used. Potetially copying hundreds of
> megabytes of data (think video).
>
>
I was thinking of different semantics. If the underlying bits change
sometime after a 'snapshot' is taken, the 'snapshot' becomes invalid and you
cannot access the underying bits. If an application wants guaranteed access
to the 'snapshot', it would have to explicitly save a copy somewhere
(sandboxed file system / coin a new transient 'Blob' via a new blob.copy()
method) and refer to the copy.

So no costly copies are made w/o explicit direction to do so from the app.

But if we don't require snapshotting, things will only break if the
> user takes the action to modify a file after giving the page access to
> it.
>
> Also, in general snapshotting is something that UAs can experiment
> with without requiring changes to the spec. Even though File.slice is
> a synchronous function, the UA can implement snapshotting without
> using synchronous IO. The UA could simply do a asynchronous file copy
> in the background. If any read operations are performed on the slice
> those could simply be stalled until the copy is finished since reads
> are always asynchronous.
>
> / Jonas
>
> On Thu, Jan 21, 2010 at 11:22 AM, Eric Uhrhane <ericu@google.com> wrote:
> > On Thu, Jan 21, 2010 at 11:15 AM, Jian Li <jianli@chromium.org> wrote:
> >> Treating blobs as snapshots sounds like a reasonable approach and it
> will
> >> make the life of the chunked upload and other scenarios easier. Now the
> >> problem is: how do we get the blob (snapshot) out of the file?
> >> 1) We can still keep the current relationship between File and Blob.
> When we
> >> slice a file by calling File.slice, a new blob that captures the current
> >> file size and modification time is returned. The following Blob
> operations,
> >> like slice, will simply inherit the cached size and modification time.
> When
> >> we access the underlying file data in XHR.send() or FileReader, the
> >> modification time will be verified and an exception could be thrown.
> >
> > This would require File.slice to do synchronous file IO, whereas
> > Blob.slice doesn't do that.
> >
> >> 2) We can remove the inheritance of Blob from File and introduce
> >> File.getAsBlob() as dimich suggested. This seems to be more elegant.
> >> However, it requires changing the File API spec a lot.
> >>
> >> On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane <ericu@google.com> wrote:
> >>>
> >>> On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov <dimich@chromium.org>
> wrote:
> >>> > On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane <ericu@google.com>
> wrote:
> >>> >>
> >>> >> I think it could.  Here's a third option:
> >>> >> Make all blobs, file-based or not, just as async as the blobs in
> >>> >> option 2.  They never do sync IO, but could potentially fail future
> >>> >> read operations if their metadata is out of date [e.g. reading
> beyond
> >>> >> EOF].  However, expose the modification time on File via an async
> >>> >> method and allow the user to pass it in to a read call to enforce
> >>> >> "fail if changed since this time".  This keeps all file accesses
> >>> >> async, but still allows for chunked uploads without mixing files
> >>> >> accidentally.  If we allow users to refresh the modification time
> >>> >> asynchronously, it also allows for adding a file to a form, changing
> >>> >> the file on disk, and then uploading the new file.  The user would
> >>> >> look up the mod time when starting the upload, rather than when the
> >>> >> file's selected.
> >>> >
> >>> > It would be great to avoid sync file I/O on calls like Blob.size.
> They
> >>> > would
> >>> > simply return cached value. Actual mismatch would be detected during
> >>> > actual
> >>> > read operation.
> >>> > However then I'm not sure how to keep File derived from Blob, since:
> >>> > 1) Currently, in FF and WebKit File.fileSize is a sync I/O that
> returns
> >>> > current file size. The current spec says File is derived from Blob
> and
> >>> > Blob
> >>> > has Blob.size property that is likely going to co-exist with
> >>> > File.fileSize
> >>> > for a while, for compat reasons. It's weird for file.size and
> >>> > file.fileSize
> >>> > to return different things.
> >>>
> >>> True, but we'd probably want to deprecate file.fileSize anyway and
> >>> then get rid of it, since it's synchronous.
> >>>
> >>> > 2) Currently, xhr.send(file) does not fail and sends the version of
> the
> >>> > file
> >>> > that exists somewhere around xhr.send(file) call was issued. Since
> File
> >>> > is
> >>> > also a Blob, xhr.send(blob) would behave the same which means if we
> want
> >>> > to
> >>> > preserve this behavior the Blob can not fail async read operation if
> >>> > file
> >>> > has changed.
> >>> > There is a contradiction here. One way to resolve it would be to
> break
> >>> > "File
> >>> > is Blob" and to be able to "capture the File as Blob" by having
> >>> > file.getAsBlob(). The latter would make a snapshot of the state of
> the
> >>> > file,
> >>> > to be able to fail subsequent async read operations if the file has
> been
> >>> > changed.
> >>> > I've asked a few people around in a non-scientific poll and it seems
> >>> > developers expect Blob to be a 'snapshot', reflecting the state of
> the
> >>> > file
> >>> > (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob
> >>> > creation.
> >>> > Since it's obviously bad to actually copy data, it seems acceptable
> to
> >>> > capture enough information (like mod time) so the read operations
> later
> >>> > can
> >>> > fail if underlying storage has been changed. It feels really strange
> if
> >>> > reading the Blob can yield some data from one version of a file (or
> >>> > Canvas)
> >>> > mixed with some data from newer version, without any indication that
> >>> > this is
> >>> > happening.
> >>> > All that means there is an option 3:
> >>> > 3. Treat all Blobs as 'snapshots' that refer to the range of
> underlying
> >>> > data
> >>> > at the moment of creation of the Blob. Blobs produced further by
> >>> > Blob.slice() operation inherit the captured state w/o actually
> verifying
> >>> > it
> >>> > against 'live' underlying objects like files. All Blobs can be 'read'
> >>> > (or
> >>> > 'sent') via operations that can fail if the underlying content has
> >>> > changed.
> >>> > Optionally, expose snapshotTime property and perhaps "read if not
> >>> > changed
> >>> > since" parameter to read operations. Do not derive File from Blob,
> >>> > rather
> >>> > have File.getAsBlob() that produces a Blob which is a snapshot of the
> >>> > file
> >>> > at the moment of call. The advantage here is that it removes need for
> >>> > sync
> >>> > operations from Blob and provides mechanism to ensure the changing
> >>> > underlying storage is detectable. The disadvantage is a bit more
> >>> > complexity
> >>> > and bigger change to File spec.
> >>>
> >>> That sounds good to me.  If we're treating blobs as snapshots, I
> >>> retract my suggestion of the read-if-not-changed-since parameter.  All
> >>> reads after the data has changed should fail.  If you want to do a
> >>> chunked upload, don't snapshot your file into a blob until you're
> >>> ready to start.  Once you've done that, just slice off parts of the
> >>> blob, not the file.
> >>
> >>
> >
>
>
Received on Thursday, 21 January 2010 21:16:21 UTC