Re: File API to separate reading from files from Arun Ranganathan on 2009-09-23 (public-webapps@w3.org from July to September 2009)

From: Arun Ranganathan <aranganathan@mozilla.com>
Date: Tue, 22 Sep 2009 22:09:48 -0700
To: "Nikunj R. Mehta" <nikunj.mehta@oracle.com>, Web Applications Working Group WG <public-webapps@w3.org>
Message-ID: <4AB9AD9C.6070809@mozilla.com>
Arun wrote:
>> There is lots that is attractive about InputStream, and I think that 
>> it can be used in other specifications, especially when discussing 
>> Camera APIs, streaming from web apps (conferencing) etc.  I also like 
>> the idea of DataHandler.  When we define a byte primitive, it can be 
>> used in conjunction with the stream interface.  For additional read 
>> features (fseek) this is also useful.  I also appreciate that you 
>> have pointed out in a subsequent email [1] that it is possible to 
>> "sidestep the issue of dealing with bytes directly."  Managing bytes 
>> properly, with the right primitives, is one reason why, despite 
>> having looked at the Java I/O APIs[2], I went with something 
>> simpler.  I think that we should have streams at some point, and I'm 
>> amenable to looking at them in a subsequent iteration of the File 
>> API.  It's worth saying here that the appeal of streams is for 
>> *multiple use cases* for both File API and other APIs, and *not* 
>> because the Java I/O model is one we should emulate.  Programmer 
>> taste and choice about coining APIs is subjective.
>
Nikunj wrote in response:
> I respect your point on taste, however, I am more interested in 
> composability than the maturity of Java I/O. 
Firstly, what Jonas proposed as the Alternative File API [1] uses an 
event model to address use cases such as progress feedback and 
separating reading from file objects.  I expressed reservations about 
complexity, but saw more posts in favor of it than against it.  This 
model has advantages that come with an event model (separate 
notifications like onprogress, onerror, allowing specific 'isolated' 
code, etc) along with a signature similarity to XHR (which developers 
are familiar with).  My caveats about the model were mainly about 
understanding trade-offs.  I'm reconciled to having a v1 of the File API 
specification based on Jonas' proposal (hopefully in good shape by the 
upcoming TPAC), and I believe we can iterate from there.
> It would be useful to see how you meet the following requirements:
>
> 1. incremental reading of a file's data
The proposal [1] reuses the FileData interface, which will still support 
a slice(offset, length) method that returns another FileData object 
within stipulated byte ranges.  I hope to flesh out what happens under 
range mathematics errors a bit more clearly (e.g. whether an exception 
is raised).  Along with progress events, I think this use case is addressed.
>
> 2. concurrent access to file data
(Note that "FileRequest" and "FileReader" are used interchangably in 
[1]; I personally prefer FileReader as a name).  Nothing precludes 
multiple FileReader objects from accessing the same file, but not all 
implementations need fire notifications (events) concurrently.  Do you 
have a specific use case in mind?
> 3. access to all file metadata without needing to read the file
(Note that in FileRequest, which I think should be named FileReader, the 
read* methods take File objects as parameters, although the email 
proposal [1] says that they take FileData objects.  Jonas means File 
objects).

The answer to your question depends on what you mean by *all* file 
metadata. 

File objects (which inherit from FileData objects) expose name and 
mediaType properties, along with size (from FileData).  But, suppose you 
wanted ID3 information from an MP3 file.  In this case (assuming ID3v1 
usage), you would *have* to read the file, and look for the 128 byte 
chunk beginning with TAG.  This can be done in two ways:

i. Using splice() and range mathematics based on the file's size to get 
to the end of the file and look at the last 128 bits of it as a separate 
FileData object (since ID3v1 puts stuff at the end).  Not ideal.
ii. Using read methods and working with the file format.  Again, not 
dripping with syntactic sugar, but certainly feasible.

I agree that metadata extraction could be made better, but I think that 
I'm happy with what the existing proposal has.  I also don't see how any 
other proposal improves on this, even if you read into a stream buffer.

I am happy with the existing metadata extraction for a v1, and believe 
that as we work out more audio and video issues on the platform, we can 
get to specific metadata issues.  Can you clear up what you mean by "all 
metadata?"
> 4. separation of error handling from file reading
In Jonas' proposal, this isn't done cleanly (for some definition of 
"clean" as separate from the reader object), but I think what *is* done 
is good for the majority of use cases.  In Jonas' proposal, the 
FileReader object (named "FileRequest" in the email [1]) allows separate 
onerror handling (along with onprogress being separate, etc.).  It's not 
done *within* a read method (unlike the existing proposal, which does 
this less well than Jonas' proposal), and the callback that handles the 
event can deal with the response.

This is as separate as is done with XHR.
>
> All things being equal, I would prefer a model that, in order of 
> priority:
>
> 1. involves fewer steps, and
Me too!  But, *both* your model and Jonas' model don't involve fewer 
steps than the original proposal :)  Jonas' model adds necessary 
complexity for the major use case (onprogress) and for an event model.
>
> 2. evolves nicely with file write and binary access, which are both 
> likely to be next evolution directions in this area.
Agreed, but again, much of what you mean by "evolves nicely" is a 
question of programmer taste.  For instance, I think that readAsBinary 
can be introduced on the FileReader object, in addition to 
readAsBinaryString.  Furthermore, I maintain that your streams proposal 
can evolve later, and doesn't prevent us from proceeding with the 
alternative File API proposal as what is in the draft [1].
>
> Can you provide a comparison of your proposed approach with my 
> proposal for the above so that the WG can develop an informed opinion 
> about the proposals?
>
I *think* I've done this in answering the questions above.
>>
>> For a first version (which should replace 
>> http://www.w3.org/TR/file-upload/ , with a more meaningful name like 
>> "File API"), I think we should address use cases around reads.  Ian 
>> Fette has given us plenty of other uses cases for consideration later 
>> on[3].  While my editor's draft strove to address the use cases for 
>> file access with different asynchronous data accessors, it was clear 
>> that it couldn't gracefully account for progress events.  Moreover, 
>> general feedback favored a model that used events with a separate 
>> reader object that allowed for progress events, and Jonas' 
>> alternative proposal does this as well as resembles XHR [4].   While 
>> I'm reluctant to sacrifice simplicity, I think moving in the 
>> direction of the "Alternative File API"[4] reconciles use cases such 
>> as progress events with calls for a reader/event model.  FWIW, I 
>> disagree that resemblance to XHR should be seen as "unwanted baggage" 
>> [5].  I think it's desirable to resemble an API that has such 
>> widespread usage!
>
> This is arguable at best, since it seems to be an opinion not shared 
> by everyone, especially not the editor of XMLHttpRequest [1]. 
There are two things here that you may be confusing!  Anne (the editor 
of the XHR2 draft) expressed support for a model based on events [2].  
What he is against is "abusing XHR" by using the URL attribute of a File 
object as part of request [3].  I disagree with his stance on this, but 
that is a bridge that we'll cross later, after we sort out details of 
the FileData URL.
> In fact, there is no similarity to XHR in the current editor's draft, 
> and I wonder why those benefits were considered unimportant when 
> drafting previously.
Note: the "benefits" I considered important centered on simplicity.  But 
others have argued in favor of a more robust model that gives us 
progress events that is not simply another callback on the existing 
proposal [4].  I expressed my support for simplicity [4] but also my 
willingness to draft a spec. based on the alterative API.  So far, only 
you are arguing *against* it, but I don't believe that the alternative 
approach blocks consideration of a stream-based approach later on.
>
>> While the web is inconsistent, event models are widely used, and 
>> similarity between XHR and File API, which will be used in 
>> conjunction anyway, is probably a good thing.
>
> Can you explain in light of the objections I raised in [2], why the 
> "Alternative File API" is the right approach. I haven't seen any 
> replies to my points.
>
I'm happy to provide more details on anything I've answered here.

-- A*

[1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0565.html
[2] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0485.html
[3] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0571.html
[4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0576.html
Received on Wednesday, 23 September 2009 05:10:32 UTC