Re: File API: Blob and underlying file changes. from Darin Fisher on 2010-01-15 (public-webapps@w3.org from January to March 2010)

From: Darin Fisher <darin@chromium.org>
Date: Thu, 14 Jan 2010 23:58:57 -0800
To: Jian Li <jianli@chromium.org>
Cc: Dmitry Titov <dimich@chromium.org>, Jonas Sicking <jonas@sicking.cc>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <bd8f24d21001142358r2919cbebpb395d6e456aece6f@mail.gmail.com>
I don't think we should worry about underlying file changes.

If the app wants to cut a file into parts and copy them separately, then
perhaps the app should first copy the file into a private area.  (I'm
presuming that one day, we'll have the concept of a chroot'd private file
storage area for a web app.)

I think we should avoid solutions that involve file locking since it is bad
for the user (loss of control) if their files are locked by the browser on
behalf of a web app.

It might be reasonable, however, to lock a file while sending it.

-Darin


On Thu, Jan 14, 2010 at 2:41 PM, Jian Li <jianli@chromium.org> wrote:

> It seems that we feel that when a File object is sent via either Form or
> XHR, the latest underlying version should be used. When we get a slice via
> Blob.slice, we assume that the underlying file data is stable since then.
>
> So for uploader scenario, we need to cut a big file into multiple pieces.
> With current File API spec, we will have to do something like the following
> to make sure that all pieces are cut from a stable file.
>     var file = myInputElement.files[0];
>     var blob = file.slice(0, file.size);
>     var piece1 = blob.slice(0, 1000);
>     var piece2 = blob.slice(1001, 1000);
>     ...
>
> The above seems a bit ugly. If we want to make it clean, what Dmitry
> proposed above seems to be reasonable. But it would require non-trivial spec
> change.
>
>
> On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov <dimich@chromium.org>wrote:
>
>> Atomic read is obviously a nice thing - it would be hard to program
>> against API that behaves as unpredictably as a single read operation that
>> reads half of old content and half of new content.
>>
>> At the same note, it would be likely very hard to program against Blob
>> objects if they could change underneath unpredictably. Imagine that we need
>> to build an uploader that cuts a big file in multiple pieces and sends those
>> pieces to the servers so they will be stitched together later. If during
>> this operation the underlying file changes and this changes all the pieces
>> that Blobs refer to (due to clamping and just silent change of content), all
>> the slicing/stitching assumptions are invalid and it's hard to even notice
>> since blobs are simply 'clamped' silently. Some degree of mess is possible
>> then.
>>
>> Another use case could be a JPEG image processor that uses slice() to cut
>> the headers from the image file and then uses info from the headers to cut
>> further JFIF fields from the file (reading EXIF and populating local
>> database of images for example). Changing the file in the middle of that is
>> bad.
>>
>> It seems the typical use cases that will need Blob.slice() functionality
>> form 'units of work' where Blob.slice() is used with likely assumption that
>> underlying data is stable and does not change silently. Such a 'unit of
>> work'  should fail as a whole if underlying file changes. One way to achieve
>> that is to reliably fail operations with 'derived' Blobs and even perhaps
>> have a 'isValid' property on it. 'Derived' Blobs are those obtained via
>> slice(), as opposite to 'original' Blobs that are also File.
>>
>> One disadvantage of this approach is that it implies that the same Blob
>> has 2 possible behaviors - when it is obtained via Blob.slice() (or other
>> methods) vs is a File.
>>
>> It all could be a bit cleaner if File did not derive from Blob, but
>> instead had getAsBlob() method - then it would be possible to say that Blobs
>> are always immutable but may become 'invalid' over time if underlying data
>> changes. The FileReader can then be just a BlobReader and have cleaner
>> semantics.
>>
>> If that was the case, then xhr.send(file) would capture the state of file
>> at the moment of sending, while xhr.send(blob) would fail with exception if
>> the blob is 'invalid' at the moment of send() operation. This would keep
>> compatibility with current behavior and avoid duplicity of Blob behavior.
>> Quite a change to the spec though...
>>
>> Dmitry
>>
>> On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>>> On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince <cprince@google.com>
>>> wrote:
>>> >> For the record, I'd like to make the read "atomic", such that you can
>>> >> never get half a file before a change, and half after. But it likely
>>> >> depends on what OSs can enforce here.
>>> >
>>> > I think *enforcing* atomicity is difficult across all OSes.
>>> >
>>> > But implementations can get nearly the same effect by checking the
>>> > file's last modification time at the start + end of the API call.  If
>>> > it has changed, the read operation can throw an exception.
>>>
>>> I'm talking about during the actual read. I.e. not related to the
>>> lifetime of the File object, just related to the time between the
>>> first 'progress' event, and the 'loadend' event. If the file changes
>>> during this time there is no way to fake atomicity since the partial
>>> file has already been returned.
>>>
>>> / Jonas
>>>
>>
>>
>
Received on Friday, 15 January 2010 07:59:28 UTC