Re: File API: Blob and underlying file changes. from Dmitry Titov on 2010-01-20 (public-webapps@w3.org from January to March 2010)

From: Dmitry Titov <dimich@chromium.org>
Date: Wed, 20 Jan 2010 15:23:59 -0800
To: Eric Uhrhane <ericu@google.com>
Cc: Jonas Sicking <jonas@sicking.cc>, Darin Fisher <darin@chromium.org>, Jian Li <jianli@chromium.org>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <28040fc61001201523m7958cf2cx8a89cabf273e825e@mail.gmail.com>

On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane <ericu@google.com> wrote:

> I think it could.  Here's a third option:
>
> Make all blobs, file-based or not, just as async as the blobs in
> option 2.  They never do sync IO, but could potentially fail future
> read operations if their metadata is out of date [e.g. reading beyond
> EOF].  However, expose the modification time on File via an async
> method and allow the user to pass it in to a read call to enforce
> "fail if changed since this time".  This keeps all file accesses
> async, but still allows for chunked uploads without mixing files
> accidentally.  If we allow users to refresh the modification time
> asynchronously, it also allows for adding a file to a form, changing
> the file on disk, and then uploading the new file.  The user would
> look up the mod time when starting the upload, rather than when the
> file's selected.

It would be great to avoid sync file I/O on calls like Blob.size. They would
simply return cached value. Actual mismatch would be detected during actual
read operation.

However then I'm not sure how to keep File derived from Blob, since:

1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns
current file size. The current spec says File is derived from Blob and Blob
has Blob.size property that is likely going to co-exist with File.fileSize
for a while, for compat reasons. It's weird for file.size and file.fileSize
to return different things.

2) Currently, xhr.send(file) does not fail and sends the version of the file
that exists somewhere around xhr.send(file) call was issued. Since File is
also a Blob, xhr.send(blob) would behave the same which means if we want to
preserve this behavior the Blob can not fail async read operation if file
has changed.

There is a contradiction here. One way to resolve it would be to break "File
is Blob" and to be able to "capture the File as Blob" by having
file.getAsBlob(). The latter would make a snapshot of the state of the file,
to be able to fail subsequent async read operations if the file has been
changed.

I've asked a few people around in a non-scientific poll and it seems
developers expect Blob to be a 'snapshot', reflecting the state of the file
(or Canvas if we get Canvas.getBlob(...)) at the moment of Blob creation.
Since it's obviously bad to actually copy data, it seems acceptable to
capture enough information (like mod time) so the read operations later can
fail if underlying storage has been changed. It feels really strange if
reading the Blob can yield some data from one version of a file (or Canvas)
mixed with some data from newer version, without any indication that this is
happening.

All that means there is an option 3:

3. Treat all Blobs as 'snapshots' that refer to the range of underlying data
at the moment of creation of the Blob. Blobs produced further by
Blob.slice() operation inherit the captured state w/o actually verifying it
against 'live' underlying objects like files. All Blobs can be 'read' (or
'sent') via operations that can fail if the underlying content has changed.
Optionally, expose snapshotTime property and perhaps "read if not changed
since" parameter to read operations. Do not derive File from Blob, rather
have File.getAsBlob() that produces a Blob which is a snapshot of the file
at the moment of call. The advantage here is that it removes need for sync
operations from Blob and provides mechanism to ensure the changing
underlying storage is detectable. The disadvantage is a bit more complexity
and bigger change to File spec.

Received on Wednesday, 20 January 2010 23:24:29 UTC