Re: Colliding FileWriters

On Wed, Feb 29, 2012 at 8:44 AM, Eric U <ericu@google.com> wrote:
> On Mon, Feb 27, 2012 at 4:40 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> On Mon, Feb 27, 2012 at 11:36 PM, Eric U <ericu@google.com> wrote:
>>>>> One working subset would be:
>>>>>
>>>>> * Keep createFileWriter async.
>>>>> * Make it optionally exclusive [possibly by default].  If exclusive,
>>>>> its length member is trustworthy.  If not, it can go stale.
>>>>> * Add an append method [needed only for non-exclusive writes, but
>>>>> useful for logs, and a safe default].
>>>>
>>>> This sounds great to me if we make it exclusive by default and remove
>>>> the .length member for non-exclusive writers. Or make it return
>>>> null/undefined.
>>>
>>> I like exclusive-by-default.  Of course, that means that by default
>>> you have to remember to call close() or depend on GC, but that's
>>> probably OK.  I'm less sure about .length being unusable on
>>> non-exclusive writers, but it's growing on me.  Since by default
>>> writers would be exclusive, length would generally work just the same
>>> as it does now.  However, if it returns null/undefined in the
>>> nonexclusive case, users might accidentally do math on it (if (length
>>>> 0) => false), and get confused.  Perhaps it should throw?
>>>
>>> Also, what's the behavior when there's already an exclusive lock, and
>>> you call createFileWriter?  Should it just not call you until the
>>> lock's free?  Do we need a trylock that fails fast, calling
>>> errorCallback?  I think the former's probably more useful than the
>>> latter, and you can always use a timer to give up if it takes too
>>> long, but there's no way to cancel a request, and you might get a call
>>> far later, when you've forgotten that you requested it.
>>>
>>>> However this brings up another problem, which is how to support
>>>> clients that want to mix read and write operations. Currently this is
>>>> supported, but as far as I can tell it's pretty awkward. Every time
>>>> you want to read you have to nest two asynchronous function calls.
>>>> First one to get a File reference, and then one to do the actual read
>>>> using a FileReader object. You can reuse the File reference, but only
>>>> if you are doing multiple reads in a row with no writing in between.
>>>
>>> I thought about this for a while, and realized that I had no good
>>> suggestion because I couldn't picture the use cases.  Do you have some
>>> handy that would help me think about it?
>>
>> Mixing reading and writing can be something as simple as increasing a
>> counter somewhere in the file. First you need to read the counter
>> value, then add one to it, then write the new value. But there's also
>> more complex operations such as reordering a set of blocks to
>> "defragment" the contents of a file. Yet another example would be
>> modifying a .zip file to add a new file. When you do this you'll want
>> to first read out the location of the current zip directory, then
>> overwrite it with the new file and then the new directory.
>
> That helps, thanks.  So we'll need to be able to do efficient
> (read[-modify-write]*), and we'll need to hold the lock for the reads
> as well as the writes.  The lock should prevent any other writes
> [exclusive or not], but need not prevent unlocked reads.

I think we'd want to prevent unlocked reads too, otherwise the read
might read the file in an inconsistent state. See more further down.

>> We sat down and did some thinking about these two issues. I.e. the
>> locking and the read-write-mixed issue. The solution is good news and
>> bad news. The good news is that we've come up with something that
>> seems like it should work, the bad news is that it's a totally
>> different design from the current FileReader and FileWriter designs.
>
> Hmm...it's interesting, but I don't think we necessarily have to scrap
> FR and FW to use it.
>
> Here's a modified version that uses the existing interfaces:
>
> interface LockedReaderWriter : FileReader, FileWriter {
>        [all the FileReader and FileWriter members]
>
>        readonly attribute File writeResult;
> }

Unfortunately this doesn't make sense since the functions on
FileReader expects a Blob to be passed to them. We could certainly use
slightly modified versions which doesn't take a Blob argument, but we
can't inherit FileReader directly.

However there are two downsides with an approach like this. First off
it means that you *always* have to nest read/write operations in
asynchronous callbacks. I.e. you always have to write code like:

lock.write(...);
lock.onsuccess = function() {
  lock.write(...);
  lock.onsuccess = function() {
    lock.read(...);
    lock.onsuccess = function() {
      lock.write(..., lock.result, ...);
      lock.onsuccess = function() {
      }
    }
  }
}

This works but easily gets messy. With my proposal you can write:

lock.write(...);
lock.write(...);
lock.read(...).onsuccess = function(event) {
  lock.write(..., event.target.result, ...)
}

I.e. you don't have to wait for the previous operation to finish
before you schedule the next one, unless you are using the result of
the previous one as input into the next.

Which also is related to the second downside. Since you have to wait
for the previous operation to finish, that means that even if you know
up front that there are 5 write operations that you want to do, you
can't send them all to the IO thread and let them happen immediately
one after another. Instead you have to send the first one to the IO
thread, once that is done you have to send a message back to the main
thread to fire the success event. Once that fires the page can
schedule the next write operation which means sending another message
to the IO thread.

In other words the IO thread will always have to wait for a round-trip
back to the main thread before it can start the next IO operation.
This would especially be a problem in browsers like Chrome where this
isn't simply cross-thread communication, but cross-process
communication.

> As with your proposal, as long as any read or write method has
> outstanding events, the lock is held.  The difference here is that
> after any write method completes, and until another one begins or the
> lock is dropped, writeResult holds the state of the File as of the
> completion of the write.  The rest of the time it's null.  That way
> you're always as up-to-date as you can easily be, but no more so [it
> doesn't show partial writes during progress events].  To read, you use
> the standard FileReader interface, slicing writeResult as needed to
> get the appropriate offset.

It should be possible to expose a |File writeResult| object in my
proposal too. The main problem with doing that is that it means that
we can't provide synchronous access to the Lock object. It'd also
require that we stat the file to get the size any time a lock is
returned.

> A potential feature of this design is that you could use it to read a
> Blob that didn't come from writeResult, letting you pull in other data
> while still holding the lock.  I'm not sure if we need that, but it's
> there if we want it.

Not really following this.

>> To do the locking without requiring calls to .close() or relying on GC
>> we use a similar setup to IndexedDB transactions. I.e. you get an
>> object which represents a locked file. As long as you use that lock to
>> read from and write to the file the lock keeps being held. However as
>> soon as you return to the event loop from the last progress
>> notification from the last read/write operation, the lock is
>> automatically released.
>
> I love that your design is [I believe] deadlock-free, as the
> write/read operations always make progress regardless of what other
> locks you might be waiting to acquire.

Yeah. The design is stolen from IndexedDB, so I really hope it's
deadlock free :-)

> Regarding the queue-lock problem:  If you're done with the lock, and
> just want to do some unlocked reads, just create a standard FileReader
> and fire it off.  I'm a little confused about fastWrite, though--it
> sounds like you're suggesting we have a write method that writes to a
> locked file even if someone else holds the lock.  Should we call it
> writeBulletToFoot?

Sorry, this wasn't terribly clear in the proposal, others have gotten
confused by it too.

The idea is that the implementation of fastWrite would be:

FileHandle.prototype.fastWrite = function(data) {
  var lock = this.getLock("readwrite");
  req = lock.write(data);
  lock.close();
  return req;
}

Where the 'close()' function is an internal function which prevents
any more requests to be placed against the lock. This way the IO
thread knows that it doesn't have to wait for the 'success' event to
finish firing before it moves on to the next lock. So the function
still waits for other locks to be released and it still ensures that
no writes happen at the same time, fastWrite or not.

The goal is purely to prevent the main-thread round-trip before moving
on to the next lock. I don't see any use cases for being able to write
in the middle of someone else's lock.

/ Jonas

Received on Monday, 19 March 2012 22:56:28 UTC