Re: Colliding FileWriters from Jonas Sicking on 2012-02-28 (public-webapps@w3.org from January to March 2012)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 28 Feb 2012 01:40:03 +0100
To: Eric U <ericu@google.com>
Cc: Webapps WG <public-webapps@w3.org>, Jian Li <jianli@chromium.org>
Message-ID: <CA+c2ei928TVAsCo5h7fU70K9Djq-SXfWpjx_dn9YFj-NJF2NgA@mail.gmail.com>
On Mon, Feb 27, 2012 at 11:36 PM, Eric U <ericu@google.com> wrote:
>>> One working subset would be:
>>>
>>> * Keep createFileWriter async.
>>> * Make it optionally exclusive [possibly by default].  If exclusive,
>>> its length member is trustworthy.  If not, it can go stale.
>>> * Add an append method [needed only for non-exclusive writes, but
>>> useful for logs, and a safe default].
>>
>> This sounds great to me if we make it exclusive by default and remove
>> the .length member for non-exclusive writers. Or make it return
>> null/undefined.
>
> I like exclusive-by-default.  Of course, that means that by default
> you have to remember to call close() or depend on GC, but that's
> probably OK.  I'm less sure about .length being unusable on
> non-exclusive writers, but it's growing on me.  Since by default
> writers would be exclusive, length would generally work just the same
> as it does now.  However, if it returns null/undefined in the
> nonexclusive case, users might accidentally do math on it (if (length
>> 0) => false), and get confused.  Perhaps it should throw?
>
> Also, what's the behavior when there's already an exclusive lock, and
> you call createFileWriter?  Should it just not call you until the
> lock's free?  Do we need a trylock that fails fast, calling
> errorCallback?  I think the former's probably more useful than the
> latter, and you can always use a timer to give up if it takes too
> long, but there's no way to cancel a request, and you might get a call
> far later, when you've forgotten that you requested it.
>
>> However this brings up another problem, which is how to support
>> clients that want to mix read and write operations. Currently this is
>> supported, but as far as I can tell it's pretty awkward. Every time
>> you want to read you have to nest two asynchronous function calls.
>> First one to get a File reference, and then one to do the actual read
>> using a FileReader object. You can reuse the File reference, but only
>> if you are doing multiple reads in a row with no writing in between.
>
> I thought about this for a while, and realized that I had no good
> suggestion because I couldn't picture the use cases.  Do you have some
> handy that would help me think about it?

Mixing reading and writing can be something as simple as increasing a
counter somewhere in the file. First you need to read the counter
value, then add one to it, then write the new value. But there's also
more complex operations such as reordering a set of blocks to
"defragment" the contents of a file. Yet another example would be
modifying a .zip file to add a new file. When you do this you'll want
to first read out the location of the current zip directory, then
overwrite it with the new file and then the new directory.

We sat down and did some thinking about these two issues. I.e. the
locking and the read-write-mixed issue. The solution is good news and
bad news. The good news is that we've come up with something that
seems like it should work, the bad news is that it's a totally
different design from the current FileReader and FileWriter designs.

To do the locking without requiring calls to .close() or relying on GC
we use a similar setup to IndexedDB transactions. I.e. you get an
object which represents a locked file. As long as you use that lock to
read from and write to the file the lock keeps being held. However as
soon as you return to the event loop from the last progress
notification from the last read/write operation, the lock is
automatically released.

This is exactly how IndexedDB transactions (and I believe WebSQL
transactions) work. To even further reduce the risk of races, the
IndexedDB spec forbids you from interacting with a transaction other
than from the point when a transaction is created until we return to
the event loop, as well as from progress event handlers from other
read/write operations. We can do the same thing with file locks. That
way it works out naturally that the lock is released when the last
event finishes firing if there are no further pending read/write
operations, since there would be no more opportunity to use the lock.
In other words, you can't post a setTimeout and use the lock. This
would be a bad idea anyway since you'd run the risk that the lock was
released before the timeout fires.


The resulting API looks something like this. I'm using the interface
name FileHandle to distinguish from the current FileEntry API:

interface FileHandle {
  LockedFile open([optional] DOMString mode); // defaults to "readonly"
  FileRequest getFile(); // .result is set to resulting File object
};

interface LockedFile {
  readonly attribute FileHandle fileHandle;
  readonly attribute DOMString mode;

  attribute long long location;

  FileRequest readAsArrayBuffer(long size);
  FileRequest readAsText(long size, [optional] DOMString encoding);
  FileRequest write(DOMString or ArrayBuffer or Blob value);
  FileRequest append(DOMString or ArrayBuffer or Blob value);

  void abort(); // Immediately releases lock
};

interface FileRequest : EventTarget
{
  readonly attribute DOMString readyState; // "pending" or "done"

  readonly attribute any result;
  readonly attribute DOMError error;

  readonly attribute LockedFile lockedFile;

  attribute nsIDOMEventListener onsuccess;
  attribute nsIDOMEventListener onerror;

  attribute nsIDOMEventListener onprogress;
}

One downside of this is that it means that if you're doing a bunch of
separate read/write operations in separate locks, each lock is held
until we've had a chance to fire the final success event for the
operation. So if you queue up a ton of small write operations you can
end up mostly sitting waiting for the main thread to finish posting
events.

To address this we can introduce the following methods on FileHandle

  FileRequest fastReadAsArrayBuffer(long size);
  FileRequest fastReadAsText(long size, [optional] DOMString encoding);
  FileRequest fastWrite(DOMString or ArrayBuffer or Blob value);
  FileRequest fastAppend(DOMString or ArrayBuffer or Blob value);

These would behave exactly the same way as the methods on LockedFile,
but would not use a lock and so would permit the next file operation
immediately with no need to wait for the progress events to finish
firing.

> That assumes that write access always grants read access, which need
> not necessarily be true.  The FileSaver API was designed assuming the
> opposite.  Of course, nobody's implemented that yet, but that's
> another thing we need to talk about, and another thread.

We're planning on implementing FileSaver, but it would be as an API
separate from FileHandle which means that this wouldn't be a problem.

In general I think it's ok to grant "add but not read" access, but
having "write but not read" runs the risk of code inadvertently
destroying data due to not expecting read operations to fail.

/ Jonas
Received on Tuesday, 28 February 2012 00:41:01 UTC