Re: File API: Blob and underlying file changes. from Dmitry Titov on 2010-01-13 (public-webapps@w3.org from January to March 2010)

From: Dmitry Titov <dimich@chromium.org>
Date: Wed, 13 Jan 2010 11:28:04 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Chris Prince <cprince@google.com>, arun@mozilla.com, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <28040fc61001131128y1c19eb13n8e91931567099bd8@mail.gmail.com>

Atomic read is obviously a nice thing - it would be hard to program against
API that behaves as unpredictably as a single read operation that reads half
of old content and half of new content.

At the same note, it would be likely very hard to program against Blob
objects if they could change underneath unpredictably. Imagine that we need
to build an uploader that cuts a big file in multiple pieces and sends those
pieces to the servers so they will be stitched together later. If during
this operation the underlying file changes and this changes all the pieces
that Blobs refer to (due to clamping and just silent change of content), all
the slicing/stitching assumptions are invalid and it's hard to even notice
since blobs are simply 'clamped' silently. Some degree of mess is possible
then.

Another use case could be a JPEG image processor that uses slice() to cut
the headers from the image file and then uses info from the headers to cut
further JFIF fields from the file (reading EXIF and populating local
database of images for example). Changing the file in the middle of that is
bad.

It seems the typical use cases that will need Blob.slice() functionality
form 'units of work' where Blob.slice() is used with likely assumption that
underlying data is stable and does not change silently. Such a 'unit of
work'  should fail as a whole if underlying file changes. One way to achieve
that is to reliably fail operations with 'derived' Blobs and even perhaps
have a 'isValid' property on it. 'Derived' Blobs are those obtained via
slice(), as opposite to 'original' Blobs that are also File.

One disadvantage of this approach is that it implies that the same Blob has
2 possible behaviors - when it is obtained via Blob.slice() (or other
methods) vs is a File.

It all could be a bit cleaner if File did not derive from Blob, but instead
had getAsBlob() method - then it would be possible to say that Blobs are
always immutable but may become 'invalid' over time if underlying data
changes. The FileReader can then be just a BlobReader and have cleaner
semantics.

If that was the case, then xhr.send(file) would capture the state of file at
the moment of sending, while xhr.send(blob) would fail with exception if the
blob is 'invalid' at the moment of send() operation. This would keep
compatibility with current behavior and avoid duplicity of Blob behavior.
Quite a change to the spec though...

Dmitry

On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince <cprince@google.com> wrote:
> >> For the record, I'd like to make the read "atomic", such that you can
> >> never get half a file before a change, and half after. But it likely
> >> depends on what OSs can enforce here.
> >
> > I think *enforcing* atomicity is difficult across all OSes.
> >
> > But implementations can get nearly the same effect by checking the
> > file's last modification time at the start + end of the API call.  If
> > it has changed, the read operation can throw an exception.
>
> I'm talking about during the actual read. I.e. not related to the
> lifetime of the File object, just related to the time between the
> first 'progress' event, and the 'loadend' event. If the file changes
> during this time there is no way to fake atomicity since the partial
> file has already been returned.
>
> / Jonas
>

Received on Wednesday, 13 January 2010 19:28:37 UTC