Re: [File API] File behavior under modification

On Mon, May 21, 2012 at 6:05 PM, Eric U <ericu@google.com> wrote:

> According to the latest editor's draft [1], a File object must always
> return an accurate lastModifiedDate if at all possible.
> "On getting, if user agents can make this information available, this
> MUST return a new Date[HTML] object initialized to the last modified
> date of the file; otherwise, this MUST return null."
>
> However, if the underlying file has been modified since the creation
> of the File, reads processed on the File must throw exceptions or fire
> error events.
> "...if the file has been modified on disk since the File object
> reference is created, user agents MUST throw a NotReadableError..."
>

(I wish this spec would remove the screaming MUSTs; HTML doesn't do this
anymore, and it's so much easier to read.)


> These seem somewhat contradictory...you can always look at the
> modification time and see that it's changed, but if you try to read it
> after a change, it blows up.
> The non-normative text about security concerns makes me think that
> perhaps both types of operations should fail if the file has changed
> ["... guarding against modifications of files on disk after a
> selection has taken place"].  That may not be necessary, but if it's
> not, I think we should call it out in non-normative text that explains
> why you can read the mod time and not the data.
>

I think lastModifiedDate should never change.  It should be the mtime of
the version of the file that the File represents a snapshot of.

This avoids synchronicity issues: reading the value twice in a row in the
same script should never give a different value or throw the second time
but not the first, because that exposes the multithreading nature of the
filesystem.

It also avoids implying that this attribute needs to perform synchronous
I/O in the middle of script execution to check the file's current
timestamp, which of course should never happen.

If we want to allow querying whether the mtime of a file has changed, it
should be done with a new asynchronous API.  I'm not sure that's needed,
though, since you can always just read one byte from the file; if it fails,
the file has changed.

(While I'm thining about it, does lastModifiedDate really need to be
nullable?  Systems without file timestamps are so rare that it's probably
better to require them to fabricate a date, so we don't introduce bugs into
people's code for rare cases.)

This came up in https://bugs.webkit.org/show_bug.cgi?id=86811; I
> believe WebKit is currently noncompliant with this part of the spec,
> and we were debating the correct behavior.  Currently WebKit delays
> grabbing the modification time of the file until it's been referenced
> by a read or slice(), so it won't notice modifications that happen
> between selection and read.


This sounds very wrong to me.  If I open a File in a page (eg. select it
with an <input>), the model is that I'm giving it access to the file as it
was at the time I dragged it in.  If the snapshot is delayed until the
first read, the page will be able to see changes made later, as long as it
doesn't touch the file immediately.  That breaks the whole security model.

Also, slice() is a synchronous API, so it should never cause blocking file
I/O.  That's fundamental to the API.

That was done because the slice creates a
> "File object reference", but in my reading creating the File referring
> to the file should be the time of the snapshot, not creating a Blob
> referring to a File.
>

FWIW, agreed.

-- 
Glenn Maynard

Received on Tuesday, 22 May 2012 01:45:43 UTC