Re: Colliding FileWriters from Eric U on 2012-02-29 (public-webapps@w3.org from January to March 2012)

From: Eric U <ericu@google.com>
Date: Wed, 29 Feb 2012 08:44:36 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>, Jian Li <jianli@chromium.org>
Message-ID: <CAHvSExfNaXYVTaaw-0Tp+jWzamkZL9k2YRLQUKqOkcjXQNE=bA@mail.gmail.com>
On Mon, Feb 27, 2012 at 4:40 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Mon, Feb 27, 2012 at 11:36 PM, Eric U <ericu@google.com> wrote:
>>>> One working subset would be:
>>>>
>>>> * Keep createFileWriter async.
>>>> * Make it optionally exclusive [possibly by default].  If exclusive,
>>>> its length member is trustworthy.  If not, it can go stale.
>>>> * Add an append method [needed only for non-exclusive writes, but
>>>> useful for logs, and a safe default].
>>>
>>> This sounds great to me if we make it exclusive by default and remove
>>> the .length member for non-exclusive writers. Or make it return
>>> null/undefined.
>>
>> I like exclusive-by-default.  Of course, that means that by default
>> you have to remember to call close() or depend on GC, but that's
>> probably OK.  I'm less sure about .length being unusable on
>> non-exclusive writers, but it's growing on me.  Since by default
>> writers would be exclusive, length would generally work just the same
>> as it does now.  However, if it returns null/undefined in the
>> nonexclusive case, users might accidentally do math on it (if (length
>>> 0) => false), and get confused.  Perhaps it should throw?
>>
>> Also, what's the behavior when there's already an exclusive lock, and
>> you call createFileWriter?  Should it just not call you until the
>> lock's free?  Do we need a trylock that fails fast, calling
>> errorCallback?  I think the former's probably more useful than the
>> latter, and you can always use a timer to give up if it takes too
>> long, but there's no way to cancel a request, and you might get a call
>> far later, when you've forgotten that you requested it.
>>
>>> However this brings up another problem, which is how to support
>>> clients that want to mix read and write operations. Currently this is
>>> supported, but as far as I can tell it's pretty awkward. Every time
>>> you want to read you have to nest two asynchronous function calls.
>>> First one to get a File reference, and then one to do the actual read
>>> using a FileReader object. You can reuse the File reference, but only
>>> if you are doing multiple reads in a row with no writing in between.
>>
>> I thought about this for a while, and realized that I had no good
>> suggestion because I couldn't picture the use cases.  Do you have some
>> handy that would help me think about it?
>
> Mixing reading and writing can be something as simple as increasing a
> counter somewhere in the file. First you need to read the counter
> value, then add one to it, then write the new value. But there's also
> more complex operations such as reordering a set of blocks to
> "defragment" the contents of a file. Yet another example would be
> modifying a .zip file to add a new file. When you do this you'll want
> to first read out the location of the current zip directory, then
> overwrite it with the new file and then the new directory.

That helps, thanks.  So we'll need to be able to do efficient
(read[-modify-write]*), and we'll need to hold the lock for the reads
as well as the writes.  The lock should prevent any other writes
[exclusive or not], but need not prevent unlocked reads.

> We sat down and did some thinking about these two issues. I.e. the
> locking and the read-write-mixed issue. The solution is good news and
> bad news. The good news is that we've come up with something that
> seems like it should work, the bad news is that it's a totally
> different design from the current FileReader and FileWriter designs.

Hmm...it's interesting, but I don't think we necessarily have to scrap
FR and FW to use it.

Here's a modified version that uses the existing interfaces:

interface LockedReaderWriter : FileReader, FileWriter {
	[all the FileReader and FileWriter members]

	readonly attribute File writeResult;
}

As with your proposal, as long as any read or write method has
outstanding events, the lock is held.  The difference here is that
after any write method completes, and until another one begins or the
lock is dropped, writeResult holds the state of the File as of the
completion of the write.  The rest of the time it's null.  That way
you're always as up-to-date as you can easily be, but no more so [it
doesn't show partial writes during progress events].  To read, you use
the standard FileReader interface, slicing writeResult as needed to
get the appropriate offset.

A potential feature of this design is that you could use it to read a
Blob that didn't come from writeResult, letting you pull in other data
while still holding the lock.  I'm not sure if we need that, but it's
there if we want it.

> To do the locking without requiring calls to .close() or relying on GC
> we use a similar setup to IndexedDB transactions. I.e. you get an
> object which represents a locked file. As long as you use that lock to
> read from and write to the file the lock keeps being held. However as
> soon as you return to the event loop from the last progress
> notification from the last read/write operation, the lock is
> automatically released.

I love that your design is [I believe] deadlock-free, as the
write/read operations always make progress regardless of what other
locks you might be waiting to acquire.

> This is exactly how IndexedDB transactions (and I believe WebSQL
> transactions) work. To even further reduce the risk of races, the
> IndexedDB spec forbids you from interacting with a transaction other
> than from the point when a transaction is created until we return to
> the event loop, as well as from progress event handlers from other
> read/write operations. We can do the same thing with file locks. That
> way it works out naturally that the lock is released when the last
> event finishes firing if there are no further pending read/write
> operations, since there would be no more opportunity to use the lock.
> In other words, you can't post a setTimeout and use the lock. This
> would be a bad idea anyway since you'd run the risk that the lock was
> released before the timeout fires.
>
>
> The resulting API looks something like this. I'm using the interface
> name FileHandle to distinguish from the current FileEntry API:
>
> interface FileHandle {
>  LockedFile open([optional] DOMString mode); // defaults to "readonly"
>  FileRequest getFile(); // .result is set to resulting File object
> };
>
> interface LockedFile {
>  readonly attribute FileHandle fileHandle;
>  readonly attribute DOMString mode;
>
>  attribute long long location;
>
>  FileRequest readAsArrayBuffer(long size);
>  FileRequest readAsText(long size, [optional] DOMString encoding);
>  FileRequest write(DOMString or ArrayBuffer or Blob value);
>  FileRequest append(DOMString or ArrayBuffer or Blob value);
>
>  void abort(); // Immediately releases lock
> };
>
> interface FileRequest : EventTarget
> {
>  readonly attribute DOMString readyState; // "pending" or "done"
>
>  readonly attribute any result;
>  readonly attribute DOMError error;
>
>  readonly attribute LockedFile lockedFile;
>
>  attribute nsIDOMEventListener onsuccess;
>  attribute nsIDOMEventListener onerror;
>
>  attribute nsIDOMEventListener onprogress;
> }
>
> One downside of this is that it means that if you're doing a bunch of
> separate read/write operations in separate locks, each lock is held
> until we've had a chance to fire the final success event for the
> operation. So if you queue up a ton of small write operations you can
> end up mostly sitting waiting for the main thread to finish posting
> events.
>
> To address this we can introduce the following methods on FileHandle
>
>  FileRequest fastReadAsArrayBuffer(long size);
>  FileRequest fastReadAsText(long size, [optional] DOMString encoding);
>  FileRequest fastWrite(DOMString or ArrayBuffer or Blob value);
>  FileRequest fastAppend(DOMString or ArrayBuffer or Blob value);
>
> These would behave exactly the same way as the methods on LockedFile,
> but would not use a lock and so would permit the next file operation
> immediately with no need to wait for the progress events to finish
> firing.
>
>> That assumes that write access always grants read access, which need
>> not necessarily be true.  The FileSaver API was designed assuming the
>> opposite.  Of course, nobody's implemented that yet, but that's
>> another thing we need to talk about, and another thread.
>
> We're planning on implementing FileSaver, but it would be as an API
> separate from FileHandle which means that this wouldn't be a problem.
>
> In general I think it's ok to grant "add but not read" access, but
> having "write but not read" runs the risk of code inadvertently
> destroying data due to not expecting read operations to fail.

Append-but-not-read and overwrite-completely-but-not-read both make
sense.  General write-but-not-read probably doesn't.  I'm not sure
what situation you're describing w.r.t. data destruction.

Regarding the queue-lock problem:  If you're done with the lock, and
just want to do some unlocked reads, just create a standard FileReader
and fire it off.  I'm a little confused about fastWrite, though--it
sounds like you're suggesting we have a write method that writes to a
locked file even if someone else holds the lock.  Should we call it
writeBulletToFoot?

	Eric
Received on Wednesday, 29 February 2012 16:45:25 UTC