Re: Colliding FileWriters

On Mon, Mar 19, 2012 at 3:55 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Wed, Feb 29, 2012 at 8:44 AM, Eric U <ericu@google.com> wrote:
>> On Mon, Feb 27, 2012 at 4:40 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> On Mon, Feb 27, 2012 at 11:36 PM, Eric U <ericu@google.com> wrote:
>>>>>> One working subset would be:
>>>>>>
>>>>>> * Keep createFileWriter async.
>>>>>> * Make it optionally exclusive [possibly by default].  If exclusive,
>>>>>> its length member is trustworthy.  If not, it can go stale.
>>>>>> * Add an append method [needed only for non-exclusive writes, but
>>>>>> useful for logs, and a safe default].
>>>>>
>>>>> This sounds great to me if we make it exclusive by default and remove
>>>>> the .length member for non-exclusive writers. Or make it return
>>>>> null/undefined.
>>>>
>>>> I like exclusive-by-default.  Of course, that means that by default
>>>> you have to remember to call close() or depend on GC, but that's
>>>> probably OK.  I'm less sure about .length being unusable on
>>>> non-exclusive writers, but it's growing on me.  Since by default
>>>> writers would be exclusive, length would generally work just the same
>>>> as it does now.  However, if it returns null/undefined in the
>>>> nonexclusive case, users might accidentally do math on it (if (length
>>>>> 0) => false), and get confused.  Perhaps it should throw?
>>>>
>>>> Also, what's the behavior when there's already an exclusive lock, and
>>>> you call createFileWriter?  Should it just not call you until the
>>>> lock's free?  Do we need a trylock that fails fast, calling
>>>> errorCallback?  I think the former's probably more useful than the
>>>> latter, and you can always use a timer to give up if it takes too
>>>> long, but there's no way to cancel a request, and you might get a call
>>>> far later, when you've forgotten that you requested it.
>>>>
>>>>> However this brings up another problem, which is how to support
>>>>> clients that want to mix read and write operations. Currently this is
>>>>> supported, but as far as I can tell it's pretty awkward. Every time
>>>>> you want to read you have to nest two asynchronous function calls.
>>>>> First one to get a File reference, and then one to do the actual read
>>>>> using a FileReader object. You can reuse the File reference, but only
>>>>> if you are doing multiple reads in a row with no writing in between.
>>>>
>>>> I thought about this for a while, and realized that I had no good
>>>> suggestion because I couldn't picture the use cases.  Do you have some
>>>> handy that would help me think about it?
>>>
>>> Mixing reading and writing can be something as simple as increasing a
>>> counter somewhere in the file. First you need to read the counter
>>> value, then add one to it, then write the new value. But there's also
>>> more complex operations such as reordering a set of blocks to
>>> "defragment" the contents of a file. Yet another example would be
>>> modifying a .zip file to add a new file. When you do this you'll want
>>> to first read out the location of the current zip directory, then
>>> overwrite it with the new file and then the new directory.
>>
>> That helps, thanks.  So we'll need to be able to do efficient
>> (read[-modify-write]*), and we'll need to hold the lock for the reads
>> as well as the writes.  The lock should prevent any other writes
>> [exclusive or not], but need not prevent unlocked reads.
>
> I think we'd want to prevent unlocked reads too, otherwise the read
> might read the file in an inconsistent state. See more further down.
>
>>> We sat down and did some thinking about these two issues. I.e. the
>>> locking and the read-write-mixed issue. The solution is good news and
>>> bad news. The good news is that we've come up with something that
>>> seems like it should work, the bad news is that it's a totally
>>> different design from the current FileReader and FileWriter designs.
>>
>> Hmm...it's interesting, but I don't think we necessarily have to scrap
>> FR and FW to use it.
>>
>> Here's a modified version that uses the existing interfaces:
>>
>> interface LockedReaderWriter : FileReader, FileWriter {
>>        [all the FileReader and FileWriter members]
>>
>>        readonly attribute File writeResult;
>> }
>
> Unfortunately this doesn't make sense since the functions on
> FileReader expects a Blob to be passed to them. We could certainly use
> slightly modified versions which doesn't take a Blob argument, but we
> can't inherit FileReader directly.

You missed the point of the writeResult field.  You can slice it and
give it to the FileReader-derived functions to do your reads, so no
modifications to the API are needed.

> However there are two downsides with an approach like this. First off
> it means that you *always* have to nest read/write operations in
> asynchronous callbacks. I.e. you always have to write code like:
>
> lock.write(...);
> lock.onsuccess = function() {
>  lock.write(...);
>  lock.onsuccess = function() {
>    lock.read(...);
>    lock.onsuccess = function() {
>      lock.write(..., lock.result, ...);
>      lock.onsuccess = function() {
>      }
>    }
>  }
> }

True.  That's forced by the fact that we've modeled FileWriter after
FileReader, which is modeled after XHR, which has explicit visible
state transitions.  If your proposal hides the state transitions, and
acts as a queuing system, then the calling code gets simpler.  I
believe that you could get the same calling simplicity from a wrapper
library, but you wouldn't get the efficiency of your API since you
wouldn't be able to stack up tasks on the IO thread.

Just to confirm: your FileRequest wouldn't have readyState or anything
like that?

> This works but easily gets messy. With my proposal you can write:
>
> lock.write(...);
> lock.write(...);
> lock.read(...).onsuccess = function(event) {
>  lock.write(..., event.target.result, ...)
> }
>
> I.e. you don't have to wait for the previous operation to finish
> before you schedule the next one, unless you are using the result of
> the previous one as input into the next.
>
> Which also is related to the second downside. Since you have to wait
> for the previous operation to finish, that means that even if you know
> up front that there are 5 write operations that you want to do, you
> can't send them all to the IO thread and let them happen immediately
> one after another. Instead you have to send the first one to the IO
> thread, once that is done you have to send a message back to the main
> thread to fire the success event. Once that fires the page can
> schedule the next write operation which means sending another message
> to the IO thread.

OK, I get this now.  And we can't just add writeAndReleaseLock to
FileWriter, because you still can't queue up events, since you can't
call write() when it's writing [in the current spec].

> In other words the IO thread will always have to wait for a round-trip
> back to the main thread before it can start the next IO operation.
> This would especially be a problem in browsers like Chrome where this
> isn't simply cross-thread communication, but cross-process
> communication.
>
>> As with your proposal, as long as any read or write method has
>> outstanding events, the lock is held.  The difference here is that
>> after any write method completes, and until another one begins or the
>> lock is dropped, writeResult holds the state of the File as of the
>> completion of the write.  The rest of the time it's null.  That way
>> you're always as up-to-date as you can easily be, but no more so [it
>> doesn't show partial writes during progress events].  To read, you use
>> the standard FileReader interface, slicing writeResult as needed to
>> get the appropriate offset.
>
> It should be possible to expose a |File writeResult| object in my
> proposal too. The main problem with doing that is that it means that
> we can't provide synchronous access to the Lock object. It'd also
> require that we stat the file to get the size any time a lock is
> returned.

You're probably going to be touching the file or statting it anyway;
how else do you know that there's a file there to lock?  Or perhaps
you've just written a chunk, in which case the file's metadata's in
the cache anyway, and you're on the IO thread anyway, so it'll be
cheap.

>> A potential feature of this design is that you could use it to read a
>> Blob that didn't come from writeResult, letting you pull in other data
>> while still holding the lock.  I'm not sure if we need that, but it's
>> there if we want it.
>
> Not really following this.

As you pointed out, readAsText etc. take a Blob argument.  You could
either pass in writeResult, or you could read from an unrelated Blob
with this reader, while still holding the lock.

>>> To do the locking without requiring calls to .close() or relying on GC
>>> we use a similar setup to IndexedDB transactions. I.e. you get an
>>> object which represents a locked file. As long as you use that lock to
>>> read from and write to the file the lock keeps being held. However as
>>> soon as you return to the event loop from the last progress
>>> notification from the last read/write operation, the lock is
>>> automatically released.
>>
>> I love that your design is [I believe] deadlock-free, as the
>> write/read operations always make progress regardless of what other
>> locks you might be waiting to acquire.
>
> Yeah. The design is stolen from IndexedDB, so I really hope it's
> deadlock free :-)
>
>> Regarding the queue-lock problem:  If you're done with the lock, and
>> just want to do some unlocked reads, just create a standard FileReader
>> and fire it off.  I'm a little confused about fastWrite, though--it
>> sounds like you're suggesting we have a write method that writes to a
>> locked file even if someone else holds the lock.  Should we call it
>> writeBulletToFoot?
>
> Sorry, this wasn't terribly clear in the proposal, others have gotten
> confused by it too.
>
> The idea is that the implementation of fastWrite would be:
>
> FileHandle.prototype.fastWrite = function(data) {
>  var lock = this.getLock("readwrite");
>  req = lock.write(data);
>  lock.close();
>  return req;
> }
>
> Where the 'close()' function is an internal function which prevents
> any more requests to be placed against the lock. This way the IO
> thread knows that it doesn't have to wait for the 'success' event to
> finish firing before it moves on to the next lock. So the function
> still waits for other locks to be released and it still ensures that
> no writes happen at the same time, fastWrite or not.

Got it--you're dropping your own lock, not ignoring anyone else's.

> The goal is purely to prevent the main-thread round-trip before moving
> on to the next lock. I don't see any use cases for being able to write
> in the middle of someone else's lock.
>
> / Jonas

Received on Tuesday, 1 May 2012 15:33:16 UTC