Re: Colliding FileWriters from Eric U on 2012-02-27 (public-webapps@w3.org from January to March 2012)

From: Eric U <ericu@google.com>
Date: Mon, 27 Feb 2012 14:36:50 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>, Jian Li <jianli@chromium.org>
Message-ID: <CAHvSExdYCo3D-TpW9nj6seE=2+xPTSG6GMgGB1Av05x5Sbo_Lw@mail.gmail.com>
Sorry about the slow response; I've been busy with dev work, and am
now getting back to spec work.

On Sat, Jan 21, 2012 at 9:57 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Wed, Jan 11, 2012 at 1:41 PM, Eric U <ericu@google.com> wrote:
>> On Wed, Jan 11, 2012 at 12:25 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> On Tue, Jan 10, 2012 at 1:32 PM, Eric U <ericu@google.com> wrote:
>>>> On Tue, Jan 10, 2012 at 1:08 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>>> Hi All,
>>>>>
>>>>> We've been looking at implementing FileWriter and had a couple of questions.
>>>>>
>>>>> First of all, what happens if multiple pages create a FileWriter for
>>>>> the same FileEntry at the same time? Will both be able to write to the
>>>>> file at the same time and whoever writes lasts to a given byte wins?
>>>>
>>>> This isn't currently specified, and that's a hole we should fill.  By
>>>> not having it in the spec, my assumption would be that last-wins would
>>>> hold, but it would be good to clarify it if that's the behavior we
>>>> want.  It's especially important given that there's nothing like
>>>> fflush(), which would help users know what "last" meant.  Speaking of
>>>> which, should we add a flushing mechanism?
>>>>
>>>>> This is different from how file systems normally work since as long as
>>>>> file is open for writing that tends to prevent other processes from
>>>>> opening the same file.
>>>>
>>>> You're perhaps thinking of windows, where by default files are opened
>>>> in exclusive mode?  On other operating systems, and on windows when
>>>> you specify FILE_SHARE_WRITE in dwShareMode in CreateFile, multiple
>>>> writers can exist simultaneously.
>>>
>>> Ah. I didn't realize this was different on other OSs. It still seems
>>> risky to not provide any means to get exclusive access. The only way I
>>> can see websites dealing with this is to create their own locking
>>> mechanism backed by using IndexedDB transactions as low-level atomic
>>> primitive (local-storage doesn't work since you can implement
>>> compare-and-swap in an atomic manner).
>>>
>>> Having a 'exclusive' flag for createFileWriter seems much easier and
>>> removes the IndexedDB dependency. I'd probably even say that it should
>>> default to true since on the web defaulting to safe rather than fast
>>> generally results in fewer bugs.
>>
>> I don't think I'd generally be averse to this.  However, it would then
>> require some sort of a revocation mechanism as well.  If you're done
>> with your FileWriter, you want to be able to get rid of it without
>> depending on GC, so that another context can create one.  And if you
>> forget to revoke it, behavior in the second context presumably depends
>> on GC, which is a bit ugly.
>
> I definitely agree that we need an explicit revoking mechanism. We
> have a similar situation in IndexedDB where as long as a IDBDatabase
> object is alive for a given database, no one can upgrade the database
> version. Here we do have an explicit .close() method, but if you
> forget to call it you end up waiting for GC. It's possibly somewhat
> less of a problem in IndexedDB though since upgrading database
> versions should be pretty rare.
>
>> I'm not quite sure how urgent this is yet, though.  I've been assuming
>> that if you have transactional/synchronization semantics you want to
>> maintain, you'll be using IDB anyway, or a server handshake, etc.  But
>> of course it's easy to write a naive app that the user loads in two
>> windows, with bad effect.
>
> Yeah, it's the "user opens page in two windows" scenario that I'm
> concerned about. As well as similar conditions if you for example have
> a Worker thread which holds a connection to the server and
> occasionally writes data to a file based on information from the
> server, and code in a window which reads data from the file and acts
> on it.

If the window is only reading, not writing, I don't see the problem
with the current design.
If the worker and window are both reading and writing, in the same
file, the problem might be in the app's design.

> I don't think we can relegate synchronization semantics to IDB. I
> think we should have synchronization semantics at least as the default
> mode for all data that is shared between Workers and Windows which can
> be running on different threads. One great example is localStorage
> which we spent a lot of effort on trying to make synchronized using
> the storage mutex. We failed there, but not due to a lack of desire,
> but due to the way the API is structured.
>
>>> Though if we add the 'exclusive' flag described above, then we'll need
>>> to keep createFileWriter async anyway.
>>
>> Right--I think we should pick whatever subset of these suggestions
>> seems the most useful, since they overlap a bit.
>
> Agreed.
>
>> One working subset would be:
>>
>> * Keep createFileWriter async.
>> * Make it optionally exclusive [possibly by default].  If exclusive,
>> its length member is trustworthy.  If not, it can go stale.
>> * Add an append method [needed only for non-exclusive writes, but
>> useful for logs, and a safe default].
>
> This sounds great to me if we make it exclusive by default and remove
> the .length member for non-exclusive writers. Or make it return
> null/undefined.

I like exclusive-by-default.  Of course, that means that by default
you have to remember to call close() or depend on GC, but that's
probably OK.  I'm less sure about .length being unusable on
non-exclusive writers, but it's growing on me.  Since by default
writers would be exclusive, length would generally work just the same
as it does now.  However, if it returns null/undefined in the
nonexclusive case, users might accidentally do math on it (if (length
> 0) => false), and get confused.  Perhaps it should throw?

Also, what's the behavior when there's already an exclusive lock, and
you call createFileWriter?  Should it just not call you until the
lock's free?  Do we need a trylock that fails fast, calling
errorCallback?  I think the former's probably more useful than the
latter, and you can always use a timer to give up if it takes too
long, but there's no way to cancel a request, and you might get a call
far later, when you've forgotten that you requested it.

> However this brings up another problem, which is how to support
> clients that want to mix read and write operations. Currently this is
> supported, but as far as I can tell it's pretty awkward. Every time
> you want to read you have to nest two asynchronous function calls.
> First one to get a File reference, and then one to do the actual read
> using a FileReader object. You can reuse the File reference, but only
> if you are doing multiple reads in a row with no writing in between.

I thought about this for a while, and realized that I had no good
suggestion because I couldn't picture the use cases.  Do you have some
handy that would help me think about it?

> If we support exclusive access (weather the default or not) this stops
> working. Once a FileWriter has exclusive access I assume that calling
> getFile should not produce a new File object until the exclusive
> access has been released.

We could go either way on this.  A File could be produced that was a
snapshot of the current state, or that worked until another write
happened, and then went invalid.  Write locks and readers can be
independent.

> I don't have any great solutions to this problem. One solution would
> be to make it possible to get a File directly from a FileWriter. This
> accessor could even be synchronous and represent the file at the time
> of the last write. This would allow syntax like:
>
> myFileEntry.createWriter(function(mywriter) {
>  // write some data
>  mywriter.write(someblob);
>  // wait for "success"
>  mywriter.onwrite = function() {
>    // Read some data;
>    reader = new FileReader;
>    reader.readAsArrayBuffer(mywriter.file);
>    // wait for "success"
>    reader.onload = function() {
>      // do something with read data
>    }
>  };
> });
>
> This is pretty hideous though, but as far as I can tell better than
> what we have now. But it is very surprising to have a File accessor on
> the FileWriter.
>
> I think the main problem is that reading and writing is spread out
> over two separate objects. I can't think of a way to make things look
> good as long as that is the case. Maybe the solution is to add
> readAsArrayBuffer/readAsText/readAsDataURL directly on FileWriter?
>
> / Jonas

That assumes that write access always grants read access, which need
not necessarily be true.  The FileSaver API was designed assuming the
opposite.  Of course, nobody's implemented that yet, but that's
another thing we need to talk about, and another thread.
Received on Monday, 27 February 2012 22:37:33 UTC