Re: Colliding FileWriters from Eric U on 2012-01-11 (public-webapps@w3.org from January to March 2012)

From: Eric U <ericu@google.com>
Date: Wed, 11 Jan 2012 13:41:48 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Webapps WG <public-webapps@w3.org>, Jian Li <jianli@chromium.org>
Message-ID: <CAHvSExeJ2xUVYmUJD3r4orrTd-fNsOepHfFRgg+4QMpN9m2dTg@mail.gmail.com>
On Wed, Jan 11, 2012 at 12:25 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Tue, Jan 10, 2012 at 1:32 PM, Eric U <ericu@google.com> wrote:
>> On Tue, Jan 10, 2012 at 1:08 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> Hi All,
>>>
>>> We've been looking at implementing FileWriter and had a couple of questions.
>>>
>>> First of all, what happens if multiple pages create a FileWriter for
>>> the same FileEntry at the same time? Will both be able to write to the
>>> file at the same time and whoever writes lasts to a given byte wins?
>>
>> This isn't currently specified, and that's a hole we should fill.  By
>> not having it in the spec, my assumption would be that last-wins would
>> hold, but it would be good to clarify it if that's the behavior we
>> want.  It's especially important given that there's nothing like
>> fflush(), which would help users know what "last" meant.  Speaking of
>> which, should we add a flushing mechanism?
>>
>>> This is different from how file systems normally work since as long as
>>> file is open for writing that tends to prevent other processes from
>>> opening the same file.
>>
>> You're perhaps thinking of windows, where by default files are opened
>> in exclusive mode?  On other operating systems, and on windows when
>> you specify FILE_SHARE_WRITE in dwShareMode in CreateFile, multiple
>> writers can exist simultaneously.
>
> Ah. I didn't realize this was different on other OSs. It still seems
> risky to not provide any means to get exclusive access. The only way I
> can see websites dealing with this is to create their own locking
> mechanism backed by using IndexedDB transactions as low-level atomic
> primitive (local-storage doesn't work since you can implement
> compare-and-swap in an atomic manner).
>
> Having a 'exclusive' flag for createFileWriter seems much easier and
> removes the IndexedDB dependency. I'd probably even say that it should
> default to true since on the web defaulting to safe rather than fast
> generally results in fewer bugs.

I don't think I'd generally be averse to this.  However, it would then
require some sort of a revocation mechanism as well.  If you're done
with your FileWriter, you want to be able to get rid of it without
depending on GC, so that another context can create one.  And if you
forget to revoke it, behavior in the second context presumably depends
on GC, which is a bit ugly.

I'm not quite sure how urgent this is yet, though.  I've been assuming
that if you have transactional/synchronization semantics you want to
maintain, you'll be using IDB anyway, or a server handshake, etc.  But
of course it's easy to write a naive app that the user loads in two
windows, with bad effect.

>>> A second question is why is FileEntry.createWriter asynchronous? It
>>> doesn't actually do any IO and so it seems like it could return an
>>> answer synchronously.
>>
>> FileWriter has a synchronous length property, just as Blob does, so it
>> needs to do IO at creation time to look it up.
>
> So how does this work if you have two tabs running in different
> processes create FileWriters for the same FileEntry. Each tab could
> end up changing the file's size in which case the the other tabs
> FileWriter will either have to synchronously update its .length, or it
> will have an outdated length.
>
> So the IO you do when creating the FileWriter is basically unreliable
> as soon as it's done.
>
> So it seems like you could get the size when creating the FileEntry
> and then use that cached size when creating FileWriter instance.

The size in the FileEntry is no more reliable than that in the
FileWriter, of course.  But if you know you're the only writer,
either's good.

> Though I wonder if it wouldn't be better to remove the .length
> property. If anything we could add a asynchronous length getter or a
> write method which appends to the end of the file (since writing is
> already asynchronous).

A new async length getter's not needed; you can use file() for that already.
I didn't originally add append due to its apparent redundancy with
seek+write, but as you point out, seek+write doesn't guarantee to
append if there are multiple writers.

> Though if we add the 'exclusive' flag described above, then we'll need
> to keep createFileWriter async anyway.

Right--I think we should pick whatever subset of these suggestions
seems the most useful, since they overlap a bit.

One working subset would be:

* Keep createFileWriter async.
* Make it optionally exclusive [possibly by default].  If exclusive,
its length member is trustworthy.  If not, it can go stale.
* Add an append method [needed only for non-exclusive writes, but
useful for logs, and a safe default].

>>> Would this also explain why FileEntry.getFile is asynchronous? I.e. it
>>> won't call it's callback until all current FileWriters have been
>>> closed?
>>
>> Nope.  It's asynchronous because a File is a Blob, and has a
>> synchronous length accessor, so we look up the length when we mint the
>> File.  Note that the length can go stale if you have multiple writers,
>> as we want to keep it fast.
>
> This reminds me of something else that I intended to ask. I seem to
> recall that you guys invalidate existing File instances pointing to a
> FileEntry if the file is modified after the File object is
> instantiated? How is this implemented? Especially given that the
> FileWriter which modified the file might live in a different process
> than the File reference. Do you guys grab a time-stamp when the File
> instance is created and then check that against the last-modified time
> of the os-file? What happens if the user modifies the OS time?

Hmm...yes, we're grabbing a timestamp, but it looks to me like we're
currently only doing it when you slice the File or append it to a
BlobBuilder.  If the user modifies the OS time, that'll probably
confuse us, but should generally be rare.

I'm not sure why we're not grabbing the timestamp when the File is
created.  [+CC Jian Li]
Also, it looks like file.lastModifiedDate currently does syncrhonous
IO, which we could eliminate if we grabbed the snapshot at creation
time.  Jian, can you comment on this?  Do we have plans to change
this?
Received on Wednesday, 11 January 2012 22:11:02 UTC