Re: Colliding FileWriters from Jonas Sicking on 2012-03-19 (public-webapps@w3.org from January to March 2012)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 19 Mar 2012 16:17:50 -0700
To: Eric U <ericu@google.com>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <CA+c2ei_UwGtitQCxwXNazu5o-rDe1nzBJrMjBTDjKjq8xuqvkA@mail.gmail.com>
On Mon, Mar 19, 2012 at 3:10 PM, Eric U <ericu@google.com> wrote:
> On Wed, Feb 29, 2012 at 8:44 AM, Eric U <ericu@google.com> wrote:
>> On Mon, Feb 27, 2012 at 4:40 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> On Mon, Feb 27, 2012 at 11:36 PM, Eric U <ericu@google.com> wrote:
>>>>>> One working subset would be:
>>>>>>
>>>>>> * Keep createFileWriter async.
>>>>>> * Make it optionally exclusive [possibly by default].  If exclusive,
>>>>>> its length member is trustworthy.  If not, it can go stale.
>>>>>> * Add an append method [needed only for non-exclusive writes, but
>>>>>> useful for logs, and a safe default].
>>>>>
>>>>> This sounds great to me if we make it exclusive by default and remove
>>>>> the .length member for non-exclusive writers. Or make it return
>>>>> null/undefined.
>>>>
>>>> I like exclusive-by-default.  Of course, that means that by default
>>>> you have to remember to call close() or depend on GC, but that's
>>>> probably OK.  I'm less sure about .length being unusable on
>>>> non-exclusive writers, but it's growing on me.  Since by default
>>>> writers would be exclusive, length would generally work just the same
>>>> as it does now.  However, if it returns null/undefined in the
>>>> nonexclusive case, users might accidentally do math on it (if (length
>>>>> 0) => false), and get confused.  Perhaps it should throw?
>>>>
>>>> Also, what's the behavior when there's already an exclusive lock, and
>>>> you call createFileWriter?  Should it just not call you until the
>>>> lock's free?  Do we need a trylock that fails fast, calling
>>>> errorCallback?  I think the former's probably more useful than the
>>>> latter, and you can always use a timer to give up if it takes too
>>>> long, but there's no way to cancel a request, and you might get a call
>>>> far later, when you've forgotten that you requested it.
>>>>
>>>>> However this brings up another problem, which is how to support
>>>>> clients that want to mix read and write operations. Currently this is
>>>>> supported, but as far as I can tell it's pretty awkward. Every time
>>>>> you want to read you have to nest two asynchronous function calls.
>>>>> First one to get a File reference, and then one to do the actual read
>>>>> using a FileReader object. You can reuse the File reference, but only
>>>>> if you are doing multiple reads in a row with no writing in between.
>>>>
>>>> I thought about this for a while, and realized that I had no good
>>>> suggestion because I couldn't picture the use cases.  Do you have some
>>>> handy that would help me think about it?
>>>
>>> Mixing reading and writing can be something as simple as increasing a
>>> counter somewhere in the file. First you need to read the counter
>>> value, then add one to it, then write the new value. But there's also
>>> more complex operations such as reordering a set of blocks to
>>> "defragment" the contents of a file. Yet another example would be
>>> modifying a .zip file to add a new file. When you do this you'll want
>>> to first read out the location of the current zip directory, then
>>> overwrite it with the new file and then the new directory.
>>
>> That helps, thanks.  So we'll need to be able to do efficient
>> (read[-modify-write]*), and we'll need to hold the lock for the reads
>> as well as the writes.  The lock should prevent any other writes
>> [exclusive or not], but need not prevent unlocked reads.
>>
>>> We sat down and did some thinking about these two issues. I.e. the
>>> locking and the read-write-mixed issue. The solution is good news and
>>> bad news. The good news is that we've come up with something that
>>> seems like it should work, the bad news is that it's a totally
>>> different design from the current FileReader and FileWriter designs.
>>
>> Hmm...it's interesting, but I don't think we necessarily have to scrap
>> FR and FW to use it.
>>
>> Here's a modified version that uses the existing interfaces:
>>
>> interface LockedReaderWriter : FileReader, FileWriter {
>>        [all the FileReader and FileWriter members]
>>
>>        readonly attribute File writeResult;
>> }
>
> This came up in an offline discussion recently regarding an
> currently-unserved use case: using a web app to edit a file outside
> the browser sandbox.  You can certainly drag the file into or out of
> the browser, but it's nothing like the experience you get with a
> native app, where if you select a file for editing you can read+write
> it many times, at its true location, without additional permission
> checks.  If we added something like a "refresh" to regain expired
> locks with this object, and some way for the user to grant permissions
> to a file for the session, it could take care of that use case.
>
> What do you think?

If we have an API which gives the web-page access to a
FileEntry/FileHandle, then it seems like it can open locks any number
of times to do proper read/write access.

We've started drafting such an API at [1], however there's a lot
remaining to figure out there, especially when it comes to security.
And the API doesn't let a page bring up a file-picker where the user
can grant read/write access to a single file.

[1] https://wiki.mozilla.org/WebAPI/DeviceStorageAPI

>> As with your proposal, as long as any read or write method has
>> outstanding events, the lock is held.  The difference here is that
>> after any write method completes, and until another one begins or the
>> lock is dropped, writeResult holds the state of the File as of the
>> completion of the write.  The rest of the time it's null.  That way
>> you're always as up-to-date as you can easily be, but no more so [it
>> doesn't show partial writes during progress events].  To read, you use
>> the standard FileReader interface, slicing writeResult as needed to
>> get the appropriate offset.
>>
>> A potential feature of this design is that you could use it to read a
>> Blob that didn't come from writeResult, letting you pull in other data
>> while still holding the lock.  I'm not sure if we need that, but it's
>> there if we want it.
>>
>>> To do the locking without requiring calls to .close() or relying on GC
>>> we use a similar setup to IndexedDB transactions. I.e. you get an
>>> object which represents a locked file. As long as you use that lock to
>>> read from and write to the file the lock keeps being held. However as
>>> soon as you return to the event loop from the last progress
>>> notification from the last read/write operation, the lock is
>>> automatically released.
>>
>> I love that your design is [I believe] deadlock-free, as the
>> write/read operations always make progress regardless of what other
>> locks you might be waiting to acquire.
>>
>>> This is exactly how IndexedDB transactions (and I believe WebSQL
>>> transactions) work. To even further reduce the risk of races, the
>>> IndexedDB spec forbids you from interacting with a transaction other
>>> than from the point when a transaction is created until we return to
>>> the event loop, as well as from progress event handlers from other
>>> read/write operations. We can do the same thing with file locks. That
>>> way it works out naturally that the lock is released when the last
>>> event finishes firing if there are no further pending read/write
>>> operations, since there would be no more opportunity to use the lock.
>>> In other words, you can't post a setTimeout and use the lock. This
>>> would be a bad idea anyway since you'd run the risk that the lock was
>>> released before the timeout fires.
>>>
>>>
>>> The resulting API looks something like this. I'm using the interface
>>> name FileHandle to distinguish from the current FileEntry API:
>>>
>>> interface FileHandle {
>>>  LockedFile open([optional] DOMString mode); // defaults to "readonly"
>>>  FileRequest getFile(); // .result is set to resulting File object
>>> };
>>>
>>> interface LockedFile {
>>>  readonly attribute FileHandle fileHandle;
>>>  readonly attribute DOMString mode;
>>>
>>>  attribute long long location;
>>>
>>>  FileRequest readAsArrayBuffer(long size);
>>>  FileRequest readAsText(long size, [optional] DOMString encoding);
>>>  FileRequest write(DOMString or ArrayBuffer or Blob value);
>>>  FileRequest append(DOMString or ArrayBuffer or Blob value);
>>>
>>>  void abort(); // Immediately releases lock
>>> };
>>>
>>> interface FileRequest : EventTarget
>>> {
>>>  readonly attribute DOMString readyState; // "pending" or "done"
>>>
>>>  readonly attribute any result;
>>>  readonly attribute DOMError error;
>>>
>>>  readonly attribute LockedFile lockedFile;
>>>
>>>  attribute nsIDOMEventListener onsuccess;
>>>  attribute nsIDOMEventListener onerror;
>>>
>>>  attribute nsIDOMEventListener onprogress;
>>> }
>>>
>>> One downside of this is that it means that if you're doing a bunch of
>>> separate read/write operations in separate locks, each lock is held
>>> until we've had a chance to fire the final success event for the
>>> operation. So if you queue up a ton of small write operations you can
>>> end up mostly sitting waiting for the main thread to finish posting
>>> events.
>
> Ah, I see--you mean if you're waiting for the final event of a write
> on one lock to start the next read on another lock, you end up waiting
> a while.

Exactly.

> Not sure there's anything to do about that--you either want
> to wait for the write to finish, or you don't.  If you don't, just go
> ahead and start the next read, given that it's in a different lock.

I think my proposal allows a page to indicate if it wants the
implementation to wait for further reads/writes or not.

/ Jonas
Received on Monday, 19 March 2012 23:18:49 UTC