Re: [File API] File behavior under modification from Arun Ranganathan on 2012-07-11 (public-webapps@w3.org from July to September 2012)

From: Arun Ranganathan <aranganathan@mozilla.com>
Date: Wed, 11 Jul 2012 15:53:06 -0400
To: Glenn Maynard <glenn@zewt.org>
Cc: Eric U <ericu@google.com>, Web Applications Working Group WG <public-webapps@w3.org>, Jonas Sicking <jonas@sicking.cc>, Kinuko Yasuda <kinuko@google.com>, Jian Li <jianli@google.com>, Alexey Proskuryakov <ap@webkit.org>, Satoru Takabayashi <satorux@google.com>, Toni Barzic <tbarzic@google.com>
Message-Id: <D8B1914B-BCF3-43AD-9622-FAFB104B2BC7@mozilla.com>

Glenn:


On May 21, 2012, at 9:44 PM, Glenn Maynard wrote:

> On Mon, May 21, 2012 at 6:05 PM, Eric U <ericu@google.com> wrote:
> According to the latest editor's draft [1], a File object must always
> return an accurate lastModifiedDate if at all possible.
> "On getting, if user agents can make this information available, this
> MUST return a new Date[HTML] object initialized to the last modified
> date of the file; otherwise, this MUST return null."
> 
> However, if the underlying file has been modified since the creation
> of the File, reads processed on the File must throw exceptions or fire
> error events.
> "...if the file has been modified on disk since the File object
> reference is created, user agents MUST throw a NotReadableError..."
> 
> (I wish this spec would remove the screaming MUSTs; HTML doesn't do this anymore, and it's so much easier to read.)

WHAT?

Heh :)  Point well taken.  I've muted the musts.


>  
> These seem somewhat contradictory...you can always look at the
> modification time and see that it's changed, but if you try to read it
> after a change, it blows up.
> The non-normative text about security concerns makes me think that
> perhaps both types of operations should fail if the file has changed
> ["... guarding against modifications of files on disk after a
> selection has taken place"].  That may not be necessary, but if it's
> not, I think we should call it out in non-normative text that explains
> why you can read the mod time and not the data.
> 
> I think lastModifiedDate should never change.  It should be the mtime of the version of the file that the File represents a snapshot of.
> 
> This avoids synchronicity issues: reading the value twice in a row in the same script should never give a different value or throw the second time but not the first, because that exposes the multithreading nature of the filesystem.  
> 
> It also avoids implying that this attribute needs to perform synchronous I/O in the middle of script execution to check the file's current timestamp, which of course should never happen.
> 
> If we want to allow querying whether the mtime of a file has changed, it should be done with a new asynchronous API.  I'm not sure that's needed, though, since you can always just read one byte from the file; if it fails, the file has changed.
> 

I agree that making snapshotting clearer might be a good idea. 

It is true that reading size and lastModifiedDate are synchronous, but this seemed a small trade-off compared to data reads.  

My instinct is that an asynchronous API for mtime is overkill.


> (While I'm thining about it, does lastModifiedDate really need to be nullable?  Systems without file timestamps are so rare that it's probably better to require them to fabricate a date, so we don't introduce bugs into people's code for rare cases.)

What's the main problem with it being nullable?  A fabricated date seems strange, but instead of being nullable we could spec what the fabricated date is.  I'm just not totally sure what the pros and cons are here.


> 
> This came up in https://bugs.webkit.org/show_bug.cgi?id=86811; I
> believe WebKit is currently noncompliant with this part of the spec,
> and we were debating the correct behavior.  Currently WebKit delays
> grabbing the modification time of the file until it's been referenced
> by a read or slice(), so it won't notice modifications that happen
> between selection and read.
> 
> This sounds very wrong to me.  If I open a File in a page (eg. select it with an <input>), the model is that I'm giving it access to the file as it was at the time I dragged it in.  If the snapshot is delayed until the first read, the page will be able to see changes made later, as long as it doesn't touch the file immediately.  That breaks the whole security model.
>  

Strong +1.  


> Also, slice() is a synchronous API, so it should never cause blocking file I/O.  That's fundamental to the API.
> 
> That was done because the slice creates a
> "File object reference", but in my reading creating the File referring
> to the file should be the time of the snapshot, not creating a Blob
> referring to a File.
> 
> FWIW, agreed.
> 

Can you log a bug so that I can provide guidance for this in spec?

-- A*

Received on Wednesday, 11 July 2012 19:53:35 UTC