Re: File API: Blob and underlying file changes. from Eric Uhrhane on 2010-01-20 (public-webapps@w3.org from January to March 2010)

From: Eric Uhrhane <ericu@google.com>
Date: Wed, 20 Jan 2010 15:44:59 -0800
To: Dmitry Titov <dimich@chromium.org>
Cc: Jonas Sicking <jonas@sicking.cc>, Darin Fisher <darin@chromium.org>, Jian Li <jianli@chromium.org>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <44b058fe1001201544v637f200dod7e22e9b98041770@mail.gmail.com>

On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov <dimich@chromium.org> wrote:
> On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane <ericu@google.com> wrote:
>>
>> I think it could.  Here's a third option:
>> Make all blobs, file-based or not, just as async as the blobs in
>> option 2.  They never do sync IO, but could potentially fail future
>> read operations if their metadata is out of date [e.g. reading beyond
>> EOF].  However, expose the modification time on File via an async
>> method and allow the user to pass it in to a read call to enforce
>> "fail if changed since this time".  This keeps all file accesses
>> async, but still allows for chunked uploads without mixing files
>> accidentally.  If we allow users to refresh the modification time
>> asynchronously, it also allows for adding a file to a form, changing
>> the file on disk, and then uploading the new file.  The user would
>> look up the mod time when starting the upload, rather than when the
>> file's selected.
>
> It would be great to avoid sync file I/O on calls like Blob.size. They would
> simply return cached value. Actual mismatch would be detected during actual
> read operation.
> However then I'm not sure how to keep File derived from Blob, since:
> 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns
> current file size. The current spec says File is derived from Blob and Blob
> has Blob.size property that is likely going to co-exist with File.fileSize
> for a while, for compat reasons. It's weird for file.size and file.fileSize
> to return different things.

True, but we'd probably want to deprecate file.fileSize anyway and
then get rid of it, since it's synchronous.

> 2) Currently, xhr.send(file) does not fail and sends the version of the file
> that exists somewhere around xhr.send(file) call was issued. Since File is
> also a Blob, xhr.send(blob) would behave the same which means if we want to
> preserve this behavior the Blob can not fail async read operation if file
> has changed.
> There is a contradiction here. One way to resolve it would be to break "File
> is Blob" and to be able to "capture the File as Blob" by having
> file.getAsBlob(). The latter would make a snapshot of the state of the file,
> to be able to fail subsequent async read operations if the file has been
> changed.
> I've asked a few people around in a non-scientific poll and it seems
> developers expect Blob to be a 'snapshot', reflecting the state of the file
> (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob creation.
> Since it's obviously bad to actually copy data, it seems acceptable to
> capture enough information (like mod time) so the read operations later can
> fail if underlying storage has been changed. It feels really strange if
> reading the Blob can yield some data from one version of a file (or Canvas)
> mixed with some data from newer version, without any indication that this is
> happening.
> All that means there is an option 3:
> 3. Treat all Blobs as 'snapshots' that refer to the range of underlying data
> at the moment of creation of the Blob. Blobs produced further by
> Blob.slice() operation inherit the captured state w/o actually verifying it
> against 'live' underlying objects like files. All Blobs can be 'read' (or
> 'sent') via operations that can fail if the underlying content has changed.
> Optionally, expose snapshotTime property and perhaps "read if not changed
> since" parameter to read operations. Do not derive File from Blob, rather
> have File.getAsBlob() that produces a Blob which is a snapshot of the file
> at the moment of call. The advantage here is that it removes need for sync
> operations from Blob and provides mechanism to ensure the changing
> underlying storage is detectable. The disadvantage is a bit more complexity
> and bigger change to File spec.

That sounds good to me.  If we're treating blobs as snapshots, I
retract my suggestion of the read-if-not-changed-since parameter.  All
reads after the data has changed should fail.  If you want to do a
chunked upload, don't snapshot your file into a blob until you're
ready to start.  Once you've done that, just slice off parts of the
blob, not the file.

Received on Wednesday, 20 January 2010 23:45:48 UTC