Re: File API: Blob and underlying file changes.

On Fri, Jan 15, 2010 at 10:19 AM, Dmitry Titov <dimich@chromium.org> wrote:

> Nobody proposed locking the file. Sorry for being unclear if that sounds
> like it. Basically it's all about timestamps.
>
> As Chris proposed earlier, a read operation can grab the timestamp of the
> file before and after reading its content and throw exception if the
> timestamps do not match. This is pretty good approximation of 'atomic' read
> - although it can not guarantee success, it can at least provide reliable
> detection of it.
>

but doesn't that imply some degree of unpredictability for web developers?
 must they always handle that exception even though it is an extremely rare
occurrence?  also, what about normal form submission, in which the file
reading happens asynchronously to form.submit().



>
> Same thing with the Blob - the slice() may capture the timestamp of the
> content it's based on. Blob can throw exception later if the modification
> timestamp of underlying data has changed since the time of Blob's creation.
>

also note that we MUST NOT design APIs that involve synchronous file access.
 no "stat" calls allowed on the main UI thread please!  (remember the
network filesystem case.)

in other words, assuming detection of file changes happens asynchronously,
we'll have trouble producing exceptions as you describe.



>
> Both actual OS locking and requiring copying files to a safe location
> before slice() seem to be problematic, for different reasons. Good example
> is youtube uploader that needs to slice and send 1Gb file, while having a
> way to reliably detect the change of the underlyign file, terminate current
> upload and potentially request another one. Copying is hard because of size
> and locking, even if provided by OS, may stay in the way of user's workflow.
>
> Dmitry
>
> On Thu, Jan 14, 2010 at 11:58 PM, Darin Fisher <darin@chromium.org> wrote:
>
>> I don't think we should worry about underlying file changes.
>>
>> If the app wants to cut a file into parts and copy them separately, then
>> perhaps the app should first copy the file into a private area.  (I'm
>> presuming that one day, we'll have the concept of a chroot'd private file
>> storage area for a web app.)
>>
>> I think we should avoid solutions that involve file locking since it is
>> bad for the user (loss of control) if their files are locked by the browser
>> on behalf of a web app.
>>
>> It might be reasonable, however, to lock a file while sending it.
>>
>> -Darin
>>
>>
>> On Thu, Jan 14, 2010 at 2:41 PM, Jian Li <jianli@chromium.org> wrote:
>>
>>> It seems that we feel that when a File object is sent via either Form or
>>> XHR, the latest underlying version should be used. When we get a slice via
>>> Blob.slice, we assume that the underlying file data is stable since then.
>>>
>>> So for uploader scenario, we need to cut a big file into multiple pieces.
>>> With current File API spec, we will have to do something like the following
>>> to make sure that all pieces are cut from a stable file.
>>>      var file = myInputElement.files[0];
>>>     var blob = file.slice(0, file.size);
>>>     var piece1 = blob.slice(0, 1000);
>>>     var piece2 = blob.slice(1001, 1000);
>>>     ...
>>>
>>> The above seems a bit ugly. If we want to make it clean, what Dmitry
>>> proposed above seems to be reasonable. But it would require non-trivial spec
>>> change.
>>>
>>>
>>> On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov <dimich@chromium.org>wrote:
>>>
>>>> Atomic read is obviously a nice thing - it would be hard to program
>>>> against API that behaves as unpredictably as a single read operation that
>>>> reads half of old content and half of new content.
>>>>
>>>> At the same note, it would be likely very hard to program against Blob
>>>> objects if they could change underneath unpredictably. Imagine that we need
>>>> to build an uploader that cuts a big file in multiple pieces and sends those
>>>> pieces to the servers so they will be stitched together later. If during
>>>> this operation the underlying file changes and this changes all the pieces
>>>> that Blobs refer to (due to clamping and just silent change of content), all
>>>> the slicing/stitching assumptions are invalid and it's hard to even notice
>>>> since blobs are simply 'clamped' silently. Some degree of mess is possible
>>>> then.
>>>>
>>>> Another use case could be a JPEG image processor that uses slice() to
>>>> cut the headers from the image file and then uses info from the headers to
>>>> cut further JFIF fields from the file (reading EXIF and populating local
>>>> database of images for example). Changing the file in the middle of that is
>>>> bad.
>>>>
>>>> It seems the typical use cases that will need Blob.slice() functionality
>>>> form 'units of work' where Blob.slice() is used with likely assumption that
>>>> underlying data is stable and does not change silently. Such a 'unit of
>>>> work'  should fail as a whole if underlying file changes. One way to achieve
>>>> that is to reliably fail operations with 'derived' Blobs and even perhaps
>>>> have a 'isValid' property on it. 'Derived' Blobs are those obtained via
>>>> slice(), as opposite to 'original' Blobs that are also File.
>>>>
>>>> One disadvantage of this approach is that it implies that the same Blob
>>>> has 2 possible behaviors - when it is obtained via Blob.slice() (or other
>>>> methods) vs is a File.
>>>>
>>>> It all could be a bit cleaner if File did not derive from Blob, but
>>>> instead had getAsBlob() method - then it would be possible to say that Blobs
>>>> are always immutable but may become 'invalid' over time if underlying data
>>>> changes. The FileReader can then be just a BlobReader and have cleaner
>>>> semantics.
>>>>
>>>> If that was the case, then xhr.send(file) would capture the state of
>>>> file at the moment of sending, while xhr.send(blob) would fail with
>>>> exception if the blob is 'invalid' at the moment of send() operation. This
>>>> would keep compatibility with current behavior and avoid duplicity of Blob
>>>> behavior. Quite a change to the spec though...
>>>>
>>>> Dmitry
>>>>
>>>> On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking <jonas@sicking.cc>wrote:
>>>>
>>>>> On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince <cprince@google.com>
>>>>> wrote:
>>>>> >> For the record, I'd like to make the read "atomic", such that you
>>>>> can
>>>>> >> never get half a file before a change, and half after. But it likely
>>>>> >> depends on what OSs can enforce here.
>>>>> >
>>>>> > I think *enforcing* atomicity is difficult across all OSes.
>>>>> >
>>>>> > But implementations can get nearly the same effect by checking the
>>>>> > file's last modification time at the start + end of the API call.  If
>>>>> > it has changed, the read operation can throw an exception.
>>>>>
>>>>> I'm talking about during the actual read. I.e. not related to the
>>>>> lifetime of the File object, just related to the time between the
>>>>> first 'progress' event, and the 'loadend' event. If the file changes
>>>>> during this time there is no way to fake atomicity since the partial
>>>>> file has already been returned.
>>>>>
>>>>> / Jonas
>>>>>
>>>>
>>>>
>>>
>>
>

Received on Friday, 15 January 2010 18:28:15 UTC