Re: File API: Blob and underlying file changes. from Eric Uhrhane on 2010-01-20 (public-webapps@w3.org from January to March 2010)

From: Eric Uhrhane <ericu@google.com>
Date: Wed, 20 Jan 2010 14:30:19 -0800
To: Dmitry Titov <dimich@chromium.org>
Cc: Jonas Sicking <jonas@sicking.cc>, Darin Fisher <darin@chromium.org>, Jian Li <jianli@chromium.org>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <44b058fe1001201430l2495c62dnd6f55d64763a93f7@mail.gmail.com>

On Wed, Jan 20, 2010 at 1:45 PM, Dmitry Titov <dimich@chromium.org> wrote:
> So it seems there is 2 ideas on how to handle the underlying file changes in
> case of File and Blob objects, nicely captured by Arun above:
> 1. Keep all Blobs 'mutating', following the underlying file change. In
> particular, it means that Blob.size and similar properties may change from
> query to query, reflecting the current file state. In case the Blob was
> sliced and corresponding portion of the file does not exist anymore, it
> would be clamped, potentially to 0, as currently specified. Read operations
> would simply read the clamped portion. That would provide similar behavior
> of all Blobs regardless if they are the Files or obtained via slice(). It
> also has a slight disadvantage that every access to Blob.size or
> Blob.slice() will incur synchronous file I/O. Note that current
> File.fileSize is already implemented like that in FF and WebKit and uses
> sync file I/O.
> 2. Treat Blobs that are Files and Blobs that are produced by slice() as
> different blobs, semantically. While former ones would 'mutate' with the
> file on the disk (to keep compat with form submission), the later would
> simply 'inherit' the file information and never do sync IO. Instead, they
> would fail later during async read operations. This has disadvantage of Blob
> behaving differently in some cases, making it hard for web developers to
> produce correct code. The synchronous file IO would be reduced but not
> completely eliminated, because the Blobs that are Files would continue to
> 'sync' with the underlying file stats during sync JS calls. One benefit is
> that it allows detection of file content change, via checks of modification
> time captured when the first slice() operation is performed and verified
> during async read operations, which provides a way to implement reliable
> file operations in face of changing files, if the developer wants to spent
> an effort to do so.
>
> It seems folks on the thread do not like the duplicity of Blobs (hard to
> program and debug), and there is also a desire to avoid synchronous file IO.
> It seems the spec today leans more to the #1. The only problem with it is
> that it's hard to implement some scenarios, like a big file upload in chunks
> - in case the file changes, the result of upload may actually be a mix of
> new and old file contents and there is no way to check... Perhaps we can
> expose File.modificationTime? It still dos not get rid of sync I/O...

I think it could.  Here's a third option:

Make all blobs, file-based or not, just as async as the blobs in
option 2.  They never do sync IO, but could potentially fail future
read operations if their metadata is out of date [e.g. reading beyond
EOF].  However, expose the modification time on File via an async
method and allow the user to pass it in to a read call to enforce
"fail if changed since this time".  This keeps all file accesses
async, but still allows for chunked uploads without mixing files
accidentally.  If we allow users to refresh the modification time
asynchronously, it also allows for adding a file to a form, changing
the file on disk, and then uploading the new file.  The user would
look up the mod time when starting the upload, rather than when the
file's selected.

    Eric

> Dmitry
> On Fri, Jan 15, 2010 at 12:10 PM, Dmitry Titov <dimich@chromium.org> wrote:
>>
>> On Fri, Jan 15, 2010 at 11:50 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>
>>> This doesn't address the problem that authors are unlikely to even
>>> attempt to deal with this situation, given how rare it is. And even
>>> less likely to deal with it successfully given how hard the situation
>>> is reproduce while testing.
>>
>> I don't know how rare the case is. It might become less rare if there is
>> an uploader of big movie files and it's easy to overwrite the big movie file
>> by hitting 'save' button in movie editor while it is still uploading...
>> Perhaps such uploader can use other means to detect the file change
>> though...
>> It would be nice to spell out some behavior though, or we can end up with
>> some incompatible implementations. Speaking about Blob.slice(), what is
>> recommended behavior of resultant Blobs on the underlying file change?
>>
>>
>>>
>>> / Jonas
>>
>
>

Received on Wednesday, 20 January 2010 22:31:09 UTC