Re: File API: Blob and underlying file changes.

Nobody proposed locking the file. Sorry for being unclear if that sounds
like it. Basically it's all about timestamps.

As Chris proposed earlier, a read operation can grab the timestamp of the
file before and after reading its content and throw exception if the
timestamps do not match. This is pretty good approximation of 'atomic' read
- although it can not guarantee success, it can at least provide reliable
detection of it.

Same thing with the Blob - the slice() may capture the timestamp of the
content it's based on. Blob can throw exception later if the modification
timestamp of underlying data has changed since the time of Blob's creation.

Both actual OS locking and requiring copying files to a safe location before
slice() seem to be problematic, for different reasons. Good example is
youtube uploader that needs to slice and send 1Gb file, while having a way
to reliably detect the change of the underlyign file, terminate current
upload and potentially request another one. Copying is hard because of size
and locking, even if provided by OS, may stay in the way of user's workflow.

Dmitry

On Thu, Jan 14, 2010 at 11:58 PM, Darin Fisher <darin@chromium.org> wrote:

> I don't think we should worry about underlying file changes.
>
> If the app wants to cut a file into parts and copy them separately, then
> perhaps the app should first copy the file into a private area.  (I'm
> presuming that one day, we'll have the concept of a chroot'd private file
> storage area for a web app.)
>
> I think we should avoid solutions that involve file locking since it is bad
> for the user (loss of control) if their files are locked by the browser on
> behalf of a web app.
>
> It might be reasonable, however, to lock a file while sending it.
>
> -Darin
>
>
> On Thu, Jan 14, 2010 at 2:41 PM, Jian Li <jianli@chromium.org> wrote:
>
>> It seems that we feel that when a File object is sent via either Form or
>> XHR, the latest underlying version should be used. When we get a slice via
>> Blob.slice, we assume that the underlying file data is stable since then.
>>
>> So for uploader scenario, we need to cut a big file into multiple pieces.
>> With current File API spec, we will have to do something like the following
>> to make sure that all pieces are cut from a stable file.
>>      var file = myInputElement.files[0];
>>     var blob = file.slice(0, file.size);
>>     var piece1 = blob.slice(0, 1000);
>>     var piece2 = blob.slice(1001, 1000);
>>     ...
>>
>> The above seems a bit ugly. If we want to make it clean, what Dmitry
>> proposed above seems to be reasonable. But it would require non-trivial spec
>> change.
>>
>>
>> On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov <dimich@chromium.org>wrote:
>>
>>> Atomic read is obviously a nice thing - it would be hard to program
>>> against API that behaves as unpredictably as a single read operation that
>>> reads half of old content and half of new content.
>>>
>>> At the same note, it would be likely very hard to program against Blob
>>> objects if they could change underneath unpredictably. Imagine that we need
>>> to build an uploader that cuts a big file in multiple pieces and sends those
>>> pieces to the servers so they will be stitched together later. If during
>>> this operation the underlying file changes and this changes all the pieces
>>> that Blobs refer to (due to clamping and just silent change of content), all
>>> the slicing/stitching assumptions are invalid and it's hard to even notice
>>> since blobs are simply 'clamped' silently. Some degree of mess is possible
>>> then.
>>>
>>> Another use case could be a JPEG image processor that uses slice() to cut
>>> the headers from the image file and then uses info from the headers to cut
>>> further JFIF fields from the file (reading EXIF and populating local
>>> database of images for example). Changing the file in the middle of that is
>>> bad.
>>>
>>> It seems the typical use cases that will need Blob.slice() functionality
>>> form 'units of work' where Blob.slice() is used with likely assumption that
>>> underlying data is stable and does not change silently. Such a 'unit of
>>> work'  should fail as a whole if underlying file changes. One way to achieve
>>> that is to reliably fail operations with 'derived' Blobs and even perhaps
>>> have a 'isValid' property on it. 'Derived' Blobs are those obtained via
>>> slice(), as opposite to 'original' Blobs that are also File.
>>>
>>> One disadvantage of this approach is that it implies that the same Blob
>>> has 2 possible behaviors - when it is obtained via Blob.slice() (or other
>>> methods) vs is a File.
>>>
>>> It all could be a bit cleaner if File did not derive from Blob, but
>>> instead had getAsBlob() method - then it would be possible to say that Blobs
>>> are always immutable but may become 'invalid' over time if underlying data
>>> changes. The FileReader can then be just a BlobReader and have cleaner
>>> semantics.
>>>
>>> If that was the case, then xhr.send(file) would capture the state of file
>>> at the moment of sending, while xhr.send(blob) would fail with exception if
>>> the blob is 'invalid' at the moment of send() operation. This would keep
>>> compatibility with current behavior and avoid duplicity of Blob behavior.
>>> Quite a change to the spec though...
>>>
>>> Dmitry
>>>
>>> On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>
>>>> On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince <cprince@google.com>
>>>> wrote:
>>>> >> For the record, I'd like to make the read "atomic", such that you can
>>>> >> never get half a file before a change, and half after. But it likely
>>>> >> depends on what OSs can enforce here.
>>>> >
>>>> > I think *enforcing* atomicity is difficult across all OSes.
>>>> >
>>>> > But implementations can get nearly the same effect by checking the
>>>> > file's last modification time at the start + end of the API call.  If
>>>> > it has changed, the read operation can throw an exception.
>>>>
>>>> I'm talking about during the actual read. I.e. not related to the
>>>> lifetime of the File object, just related to the time between the
>>>> first 'progress' event, and the 'loadend' event. If the file changes
>>>> during this time there is no way to fake atomicity since the partial
>>>> file has already been returned.
>>>>
>>>> / Jonas
>>>>
>>>
>>>
>>
>

Received on Friday, 15 January 2010 18:20:02 UTC