Re: Colliding FileWriters from Eric U on 2012-03-19 (public-webapps@w3.org from January to March 2012)

From: Eric U <ericu@google.com>
Date: Mon, 19 Mar 2012 15:10:46 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <CAHvSExcfqBiK8jBDUsab4no+unVGuqJCcRAY9XYWf3FDP9dqkg@mail.gmail.com>
On Wed, Feb 29, 2012 at 8:44 AM, Eric U <ericu@google.com> wrote:
> On Mon, Feb 27, 2012 at 4:40 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> On Mon, Feb 27, 2012 at 11:36 PM, Eric U <ericu@google.com> wrote:
>>>>> One working subset would be:
>>>>>
>>>>> * Keep createFileWriter async.
>>>>> * Make it optionally exclusive [possibly by default].  If exclusive,
>>>>> its length member is trustworthy.  If not, it can go stale.
>>>>> * Add an append method [needed only for non-exclusive writes, but
>>>>> useful for logs, and a safe default].
>>>>
>>>> This sounds great to me if we make it exclusive by default and remove
>>>> the .length member for non-exclusive writers. Or make it return
>>>> null/undefined.
>>>
>>> I like exclusive-by-default.  Of course, that means that by default
>>> you have to remember to call close() or depend on GC, but that's
>>> probably OK.  I'm less sure about .length being unusable on
>>> non-exclusive writers, but it's growing on me.  Since by default
>>> writers would be exclusive, length would generally work just the same
>>> as it does now.  However, if it returns null/undefined in the
>>> nonexclusive case, users might accidentally do math on it (if (length
>>>> 0) => false), and get confused.  Perhaps it should throw?
>>>
>>> Also, what's the behavior when there's already an exclusive lock, and
>>> you call createFileWriter?  Should it just not call you until the
>>> lock's free?  Do we need a trylock that fails fast, calling
>>> errorCallback?  I think the former's probably more useful than the
>>> latter, and you can always use a timer to give up if it takes too
>>> long, but there's no way to cancel a request, and you might get a call
>>> far later, when you've forgotten that you requested it.
>>>
>>>> However this brings up another problem, which is how to support
>>>> clients that want to mix read and write operations. Currently this is
>>>> supported, but as far as I can tell it's pretty awkward. Every time
>>>> you want to read you have to nest two asynchronous function calls.
>>>> First one to get a File reference, and then one to do the actual read
>>>> using a FileReader object. You can reuse the File reference, but only
>>>> if you are doing multiple reads in a row with no writing in between.
>>>
>>> I thought about this for a while, and realized that I had no good
>>> suggestion because I couldn't picture the use cases.  Do you have some
>>> handy that would help me think about it?
>>
>> Mixing reading and writing can be something as simple as increasing a
>> counter somewhere in the file. First you need to read the counter
>> value, then add one to it, then write the new value. But there's also
>> more complex operations such as reordering a set of blocks to
>> "defragment" the contents of a file. Yet another example would be
>> modifying a .zip file to add a new file. When you do this you'll want
>> to first read out the location of the current zip directory, then
>> overwrite it with the new file and then the new directory.
>
> That helps, thanks.  So we'll need to be able to do efficient
> (read[-modify-write]*), and we'll need to hold the lock for the reads
> as well as the writes.  The lock should prevent any other writes
> [exclusive or not], but need not prevent unlocked reads.
>
>> We sat down and did some thinking about these two issues. I.e. the
>> locking and the read-write-mixed issue. The solution is good news and
>> bad news. The good news is that we've come up with something that
>> seems like it should work, the bad news is that it's a totally
>> different design from the current FileReader and FileWriter designs.
>
> Hmm...it's interesting, but I don't think we necessarily have to scrap
> FR and FW to use it.
>
> Here's a modified version that uses the existing interfaces:
>
> interface LockedReaderWriter : FileReader, FileWriter {
>        [all the FileReader and FileWriter members]
>
>        readonly attribute File writeResult;
> }

This came up in an offline discussion recently regarding an
currently-unserved use case: using a web app to edit a file outside
the browser sandbox.  You can certainly drag the file into or out of
the browser, but it's nothing like the experience you get with a
native app, where if you select a file for editing you can read+write
it many times, at its true location, without additional permission
checks.  If we added something like a "refresh" to regain expired
locks with this object, and some way for the user to grant permissions
to a file for the session, it could take care of that use case.

What do you think?

> As with your proposal, as long as any read or write method has
> outstanding events, the lock is held.  The difference here is that
> after any write method completes, and until another one begins or the
> lock is dropped, writeResult holds the state of the File as of the
> completion of the write.  The rest of the time it's null.  That way
> you're always as up-to-date as you can easily be, but no more so [it
> doesn't show partial writes during progress events].  To read, you use
> the standard FileReader interface, slicing writeResult as needed to
> get the appropriate offset.
>
> A potential feature of this design is that you could use it to read a
> Blob that didn't come from writeResult, letting you pull in other data
> while still holding the lock.  I'm not sure if we need that, but it's
> there if we want it.
>
>> To do the locking without requiring calls to .close() or relying on GC
>> we use a similar setup to IndexedDB transactions. I.e. you get an
>> object which represents a locked file. As long as you use that lock to
>> read from and write to the file the lock keeps being held. However as
>> soon as you return to the event loop from the last progress
>> notification from the last read/write operation, the lock is
>> automatically released.
>
> I love that your design is [I believe] deadlock-free, as the
> write/read operations always make progress regardless of what other
> locks you might be waiting to acquire.
>
>> This is exactly how IndexedDB transactions (and I believe WebSQL
>> transactions) work. To even further reduce the risk of races, the
>> IndexedDB spec forbids you from interacting with a transaction other
>> than from the point when a transaction is created until we return to
>> the event loop, as well as from progress event handlers from other
>> read/write operations. We can do the same thing with file locks. That
>> way it works out naturally that the lock is released when the last
>> event finishes firing if there are no further pending read/write
>> operations, since there would be no more opportunity to use the lock.
>> In other words, you can't post a setTimeout and use the lock. This
>> would be a bad idea anyway since you'd run the risk that the lock was
>> released before the timeout fires.
>>
>>
>> The resulting API looks something like this. I'm using the interface
>> name FileHandle to distinguish from the current FileEntry API:
>>
>> interface FileHandle {
>>  LockedFile open([optional] DOMString mode); // defaults to "readonly"
>>  FileRequest getFile(); // .result is set to resulting File object
>> };
>>
>> interface LockedFile {
>>  readonly attribute FileHandle fileHandle;
>>  readonly attribute DOMString mode;
>>
>>  attribute long long location;
>>
>>  FileRequest readAsArrayBuffer(long size);
>>  FileRequest readAsText(long size, [optional] DOMString encoding);
>>  FileRequest write(DOMString or ArrayBuffer or Blob value);
>>  FileRequest append(DOMString or ArrayBuffer or Blob value);
>>
>>  void abort(); // Immediately releases lock
>> };
>>
>> interface FileRequest : EventTarget
>> {
>>  readonly attribute DOMString readyState; // "pending" or "done"
>>
>>  readonly attribute any result;
>>  readonly attribute DOMError error;
>>
>>  readonly attribute LockedFile lockedFile;
>>
>>  attribute nsIDOMEventListener onsuccess;
>>  attribute nsIDOMEventListener onerror;
>>
>>  attribute nsIDOMEventListener onprogress;
>> }
>>
>> One downside of this is that it means that if you're doing a bunch of
>> separate read/write operations in separate locks, each lock is held
>> until we've had a chance to fire the final success event for the
>> operation. So if you queue up a ton of small write operations you can
>> end up mostly sitting waiting for the main thread to finish posting
>> events.

Ah, I see--you mean if you're waiting for the final event of a write
on one lock to start the next read on another lock, you end up waiting
a while.  Not sure there's anything to do about that--you either want
to wait for the write to finish, or you don't.  If you don't, just go
ahead and start the next read, given that it's in a different lock.

>> To address this we can introduce the following methods on FileHandle
>>
>>  FileRequest fastReadAsArrayBuffer(long size);
>>  FileRequest fastReadAsText(long size, [optional] DOMString encoding);
>>  FileRequest fastWrite(DOMString or ArrayBuffer or Blob value);
>>  FileRequest fastAppend(DOMString or ArrayBuffer or Blob value);
>>
>> These would behave exactly the same way as the methods on LockedFile,
>> but would not use a lock and so would permit the next file operation
>> immediately with no need to wait for the progress events to finish
>> firing.
>>
>>> That assumes that write access always grants read access, which need
>>> not necessarily be true.  The FileSaver API was designed assuming the
>>> opposite.  Of course, nobody's implemented that yet, but that's
>>> another thing we need to talk about, and another thread.
>>
>> We're planning on implementing FileSaver, but it would be as an API
>> separate from FileHandle which means that this wouldn't be a problem.
>>
>> In general I think it's ok to grant "add but not read" access, but
>> having "write but not read" runs the risk of code inadvertently
>> destroying data due to not expecting read operations to fail.
>
> Append-but-not-read and overwrite-completely-but-not-read both make
> sense.  General write-but-not-read probably doesn't.  I'm not sure
> what situation you're describing w.r.t. data destruction.
>
> Regarding the queue-lock problem:  If you're done with the lock, and
> just want to do some unlocked reads, just create a standard FileReader
> and fire it off.  I'm a little confused about fastWrite, though--it
> sounds like you're suggesting we have a write method that writes to a
> locked file even if someone else holds the lock.  Should we call it
> writeBulletToFoot?
>
>        Eric
Received on Monday, 19 March 2012 22:11:29 UTC