Re: File API: Blob and underlying file changes. from Dmitry Titov on 2010-02-01 (public-webapps@w3.org from January to March 2010)

From: Dmitry Titov <dimich@chromium.org>
Date: Mon, 1 Feb 2010 12:27:01 -0800
To: public-webapps@w3.org
Message-ID: <28040fc61002011227o76559aebp161a0bc73834d3cf@mail.gmail.com>
Going a bit back to current spec and changing underlying files - here is an
update on our thinking (and current implementation plan). We played with
File/Blob ideas a little more and talked with some of our app developers. In
regard to a problem of changing file, most folks feel the Blob is best to be
though of as a 'snapshot of a byte range' with a delayed promise to deliver
the actual bytes in that range from the underlying data storage. It is a
'delayed promise' because all the actual 'reading' methods are async.
Basically, in terms of implementation, the Blob is not a 'container of
bytes' but rather a 'reference' to the byte range.

As such, the async read operations later may fail, for many reasons - the
file can be deleted, renamed, modified, etc. It seems developers sometimes
want to be oblivious to those problems, but in other scenarios they want to
process them. Basically, it's app-specific choice. It appears that the
following implementation goes along with the current edition of the spec but
also provides the ability to detect the file change:

1. File derives from Blob, so there is a File.size that performs synchronous
file I/O. Not ideal, but easy to use and compatible with current forms
upload.
2. File.slice() also does a synchronous IO and captures the current size and
modification time of the underlying file - and caches it in the resulting
Blob.
3. Subsequent Blob.slice() and Blob.size calls do not do any file IO, but
merely operate on cached values. So the only Blob methods that do sync IO
are those on the File object. Subsequent slicing operates on the file
information captured from File and propagate it to derived Blobs.
4. In xhr.send() and FileReader, if the UA discovers that the underlying
file is changed, it behaves just like when other file errors are discovered
- returning 'error' progress event and setting FileReader.error attribute
for example. We might need another FileError code for that if existing ones
do not feel adequate.

This way, the folks who don't care about changing files could simply ignore
the error results - because they likely do not worry about other errors as
well (such as NOT_FOUND_ERR). At the same time, folks that worry about such
things, could simply process the errors already specified. It also doesn't
add new exceptions to the picture so no special code is needed in simple
cases.

One obvious difficulty here is the synchronous file IO on File.size and
File.slice(). Trying to eliminate it requires some complexity in API that is
not obviously better. It either leads to some strange APIs like a getSize()
with a callback that delivers the size, or/and breaks behavior of currently
implemented File (and most developer's expectations). In any case, an
attempt to completely avoid sync IO and preserve correctness seems to be
calling for a way more involved API. Considering that most uploaders which
slice the file and send it in pieces will likely do it in a worker thread,
sync IO in these places perhaps is a lesser evil then complicated (or dual)
API...

Thanks,
Dmitry

On Wed, Jan 27, 2010 at 4:40 AM, Juan Lanus <juan.lanus@gmail.com> wrote:

> On Wed, Jan 27, 2010 at 01:16, Robert O'Callahan <robert@ocallahan.org>
> wrote:
> > On Wed, Jan 27, 2010 at 5:38 AM, Juan Lanus <juan.lanus@gmail.com>
> wrote:
> >>
> >> Quite right Bob. But still the lock is the way to go. At least as of
> >> today.
> >>
> >> HTML5 might be mainstream for the next 10 years, starting rather soon.
> >>
> >> In the meanwhile OSs will also evolve, in a way that we can't tell
> >> now. But if there are common issues, like this one, somebody will come
> >> up with a smart solution maybe soon.
> >> For example feeding an image of the file as of the instant it was
> >> opened (like relational databases do to provide stable queries) by
> >> keeping a temporary map to the original disk segments that comprised
> >> the file before it was changed.
> >> For example Apple is encouraging advisory locks
> >>
> >>
> http://developer.apple.com/mac/library/technotes/tn/tn2037.html#OSSolutions
> >> asking developers to design in an environment-aware mood.
> >
> > In my experience, almost no code uses advisory locking unless it is being
> > explicitly designed for some kind of concurrent usage, i.e., Apple's
> advice
> > is not being followed. If that's not going to suddenly change --- and I
> see
> > no evidence it will --- then asking the UA to apply a mandatory lock is
> > asking the UA to do something impossible, which is generally not a good
> > idea.
> > Rob
>
> Right, not talking about locks any more because it would be telling
> HOW the UA should do it, and what is best for the UA developers is to
> be told WHAT to do.
> Not writing a tutorial but a specification. Let the developer find out
> how to do it, this year, and with the tools that will be available by
> 2020.
>
> Now, out of the locks subject, what I want to be sure of is that the
> specification does not specify the "mutating blob", the origin of this
> thread.
> --
> Juan
>
>
> > "He was pierced for our transgressions, he was crushed for our
> iniquities;
> > the punishment that brought us peace was upon him, and by his wounds we
> are
> > healed. We all, like sheep, have gone astray, each of us has turned to
> his
> > own way; and the LORD has laid on him the iniquity of us all." [Isaiah
> > 53:5-6]
> Indeed.
>
>
Received on Monday, 1 February 2010 20:27:31 UTC