Re: File API: Blob and underlying file changes. from Jonas Sicking on 2010-01-15 (public-webapps@w3.org from January to March 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 15 Jan 2010 02:32:52 -0800
To: Darin Fisher <darin@chromium.org>
Cc: Jian Li <jianli@chromium.org>, Dmitry Titov <dimich@chromium.org>, Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <63df84f1001150232k55d7f6b6uf079781f782075bf@mail.gmail.com>
On Thu, Jan 14, 2010 at 11:58 PM, Darin Fisher <darin@chromium.org> wrote:
> I don't think we should worry about underlying file changes.
> If the app wants to cut a file into parts and copy them separately, then
> perhaps the app should first copy the file into a private area.  (I'm
> presuming that one day, we'll have the concept of a chroot'd private file
> storage area for a web app.)
> I think we should avoid solutions that involve file locking since it is bad
> for the user (loss of control) if their files are locked by the browser on
> behalf of a web app.
> It might be reasonable, however, to lock a file while sending it.

I largely agree. Though I think it'd be reasonable to lock the file
while reading it too.

/ Jonas

> On Thu, Jan 14, 2010 at 2:41 PM, Jian Li <jianli@chromium.org> wrote:
>>
>> It seems that we feel that when a File object is sent via either Form or
>> XHR, the latest underlying version should be used. When we get a slice via
>> Blob.slice, we assume that the underlying file data is stable since then.
>> So for uploader scenario, we need to cut a big file into multiple pieces.
>> With current File API spec, we will have to do something like the following
>> to make sure that all pieces are cut from a stable file.
>>     var file = myInputElement.files[0];
>>     var blob = file.slice(0, file.size);
>>     var piece1 = blob.slice(0, 1000);
>>     var piece2 = blob.slice(1001, 1000);
>>     ...
>> The above seems a bit ugly. If we want to make it clean, what Dmitry
>> proposed above seems to be reasonable. But it would require non-trivial spec
>> change.
>>
>> On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov <dimich@chromium.org>
>> wrote:
>>>
>>> Atomic read is obviously a nice thing - it would be hard to program
>>> against API that behaves as unpredictably as a single read operation that
>>> reads half of old content and half of new content.
>>> At the same note, it would be likely very hard to program against Blob
>>> objects if they could change underneath unpredictably. Imagine that we need
>>> to build an uploader that cuts a big file in multiple pieces and sends those
>>> pieces to the servers so they will be stitched together later. If during
>>> this operation the underlying file changes and this changes all the pieces
>>> that Blobs refer to (due to clamping and just silent change of content), all
>>> the slicing/stitching assumptions are invalid and it's hard to even notice
>>> since blobs are simply 'clamped' silently. Some degree of mess is possible
>>> then.
>>> Another use case could be a JPEG image processor that uses slice() to cut
>>> the headers from the image file and then uses info from the headers to cut
>>> further JFIF fields from the file (reading EXIF and populating local
>>> database of images for example). Changing the file in the middle of that is
>>> bad.
>>> It seems the typical use cases that will need Blob.slice() functionality
>>> form 'units of work' where Blob.slice() is used with likely assumption that
>>> underlying data is stable and does not change silently. Such a 'unit of
>>> work'  should fail as a whole if underlying file changes. One way to achieve
>>> that is to reliably fail operations with 'derived' Blobs and even perhaps
>>> have a 'isValid' property on it. 'Derived' Blobs are those obtained via
>>> slice(), as opposite to 'original' Blobs that are also File.
>>> One disadvantage of this approach is that it implies that the same Blob
>>> has 2 possible behaviors - when it is obtained via Blob.slice() (or other
>>> methods) vs is a File.
>>> It all could be a bit cleaner if File did not derive from Blob, but
>>> instead had getAsBlob() method - then it would be possible to say that Blobs
>>> are always immutable but may become 'invalid' over time if underlying data
>>> changes. The FileReader can then be just a BlobReader and have cleaner
>>> semantics.
>>> If that was the case, then xhr.send(file) would capture the state of file
>>> at the moment of sending, while xhr.send(blob) would fail with exception if
>>> the blob is 'invalid' at the moment of send() operation. This would keep
>>> compatibility with current behavior and avoid duplicity of Blob behavior.
>>> Quite a change to the spec though...
>>> Dmitry
>>> On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>
>>>> On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince <cprince@google.com>
>>>> wrote:
>>>> >> For the record, I'd like to make the read "atomic", such that you can
>>>> >> never get half a file before a change, and half after. But it likely
>>>> >> depends on what OSs can enforce here.
>>>> >
>>>> > I think *enforcing* atomicity is difficult across all OSes.
>>>> >
>>>> > But implementations can get nearly the same effect by checking the
>>>> > file's last modification time at the start + end of the API call.  If
>>>> > it has changed, the read operation can throw an exception.
>>>>
>>>> I'm talking about during the actual read. I.e. not related to the
>>>> lifetime of the File object, just related to the time between the
>>>> first 'progress' event, and the 'loadend' event. If the file changes
>>>> during this time there is no way to fake atomicity since the partial
>>>> file has already been returned.
>>>>
>>>> / Jonas
>>>
>>
>
>
Received on Friday, 15 January 2010 10:33:46 UTC