Re: File API to separate reading from files from Nikunj R. Mehta on 2009-08-19 (public-webapps@w3.org from July to September 2009)

From: Nikunj R. Mehta <nikunj.mehta@oracle.com>
Date: Wed, 19 Aug 2009 12:33:57 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: Web Applications Working Group WG <public-webapps@w3.org>
Message-Id: <9A2880D3-0108-466F-A769-A909C1717E3C@oracle.com>
On Aug 19, 2009, at 12:21 PM, Jonas Sicking wrote:

> On Wed, Aug 19, 2009 at 11:47 AM, Nikunj R.
> Mehta<nikunj.mehta@oracle.com> wrote:
>> Here's an alternative, more easily extensible, proposal for reading  
>> files.
>> It provides applications a way to read small amounts of data at a  
>> time. It
>> also allows applications to concurrently read the same file.
>> Firstly, there is a simple interface to access file metadata. This  
>> metadata
>> is always accessed synchronously. A file object could be passed to  
>> XHR, in
>> which case it can upload the file during the send() process.
>> interface File {
>>   readonly attribute DOMString name;
>> readonly attribute DOMString mediaType;
>> readonly atribute DOMString url;
>> readonly attribute unsigned long long size;
>> }
>> Secondly, a list of files can be obtained using some UI.
>> typedef sequence<File> FileList;
>> Thirdly, an abstract interface is an input stream that is not  
>> limited to
>> files. It works at the level of bytes that files are made of. The  
>> read()
>> operation can specify the extent that is required. If an  
>> application wishes
>> to read small increments, it can thus specify those increments. Of  
>> course,
>> the File interface identifies its size, so the application can  
>> suitably
>> choose increments. Processing of blocks read from the file occurs in
>> callbacks. XHR could also consider taking an InputStream parameter  
>> during
>> the send() operation.
>> interface InputStream {
>>   read(in DataHandler, [optional in] long long offset, [optional  
>> in] long
>> long length);
>> abort();
>> attribute Function onerror;
>> }
>> Fourthly, reading a block of bytes is supported through an  
>> interface that
>> accepts an array of integers. This is similar to the Gears Blob  
>> interface.
>> [CallBack=FunctionOnly]
>> interface DataHandler {
>>   handle(in sequence<int> data);
>> }
>> Fifthly, a file can be used for reading an input stream by  
>> specifying the
>> name of a file when constructing the stream
>> [Constructor(in File toOpen)]
>> interface FileInputStream : InputStream {
>> }
>> Sixthly, one can create various kinds of derived readers such as text
>> reader, binary string reader, and data URL reader. By inheriting from
>> InputStream, the basic mechanisms such as abort and onerror are  
>> inherited.
>> Moreover, the base read behavior is altered by the subclass  
>> although it
>> behaves in a similar manner, except that the data seen outside is  
>> different.
>> [Constructor(in InputStream base)]
>> interface BinaryStringInputStream : InputStream {
>>   read(in StringDataHandler, [optional in] long long offset,  
>> [optional in]
>> long long length);
>> }
>> The callback is provided a DOMString. The String's length is  
>> expected to
>> match the increment requested.
>> [CallBack=FunctionOnly]
>> interface StringDataHandler {
>> handle(in DOMString data);
>> }
>> For text reading, encoding is optionally specified.
>> [Constructor(in InputStream base, [optional in] DOMString encoding)]
>> interface TextInputStream : InputStream {
>>   read(in StringDataHandler, [optional in] long long offset,  
>> [optional in]
>> long long length);
>> }
>>
>> A file can be alternatively read as a dataURL using a similar kind of
>> handler as above.
>> [Constructor(in InputStream base)]
>> interface FileDataURL: InputStream {
>>   read(in StringDataHandler, [optional in] long long offset,  
>> [optional in]
>> long long length);
>> }
>> This API has the advantage that it can cleanly be extended to deal  
>> with both
>> writing use cases and binary data. Furthermore, it can also support
>> extensions that perform cryptographic, compression, or coding on  
>> top of the
>> basic interfaces.
>> To compare with the editor's draft, here's a typical programming  
>> case in
>> JavaScript:
>> var fileList = ...
>> // There is a mistake in the example provided in Section 3 where it  
>> does
>> fileList.files[0]
>> var myFile = fileList[0];
>> // *According to editor's draft*
>> myFile.getAsText(handleDataAsText)
>> function handleDataAsText(fileContent, error) {
>>   if (error) {
>>   }
>> }
>> // *According to my proposal*
>> var stream = new TextInputStream(new FileInputStream(myFile),  
>> "UTF-16");
>> stream.read(handleDataAsText);
>> stream.onerror = errorHandler;
>> function handleDataAsText(fileContent) {
>> }
>> function errorHandler(error) {
>> }
>> Note the two differences:
>> 1. Error handling is separated from file reading
>> 2. Two extra objects are needed to read text data out of the file.  
>> However,
>> the composability of input streams enables a far richer library to  
>> operate.
>> This API matches more closely the Java API for IO. It also benefits  
>> from the
>> extensibility model used in Java, while retaining the asynchronous
>> processing nature that is preferred in ECMAScript environments. It  
>> is also
>> not too different from the editor's draft in that it does not  
>> introduce a
>> completely different kind of data processing - we are still looking  
>> at
>> callbacks. However, the improvement is in the composability of  
>> streams as
>> well as supporting multiple concurrent file readers and processing  
>> blocks of
>> data at a time.
>> Progress events can be built on top but I welcome suggestions to  
>> build them
>> in to this proposal.
>> Nikunj
>> http://o-micron.blogspot.com
>
> A few comments on this:
>
> I do like the idea of having a stream primitive. I think we'll need
> that for other things in the future such as reading data from a
> camera, or reading data from a microphone.
>
> However I'm not convinced that we should force people to use streams
> to deal with the simple use case of reading data from a file. In 95%
> (if not more) of the cases the user simply wants to get the contents
> of the file, and so forcing them to do that using:
>
> (new BinaryStringInputStream(new FileInputStream(file)).read(handler);
>
> seems a bit complicated.

That is 70 characters vs. 80 in your proposal vs. 32 in the editor's  
draft.

// alternative API
reader = new FileReader;
reader.readAsBinaryString(file);
reader.onload = handler;

// editor's draft
file.getAsBinaryString(handler);

I don't know whether we should give primacy to the number of  
characters typed. It can easily be hidden away in JavaScript  
libraries. What I want to ensure is that the foundation of the File  
API is strong. And that is why I proposed this new API.

>
> I also don't think an array of integers is something that we want to
> use. I've been told in the past that such a construct has a lot of
> overhead in JS engines. A better solution would be to have a new
> primitive that can hold binary data. ECMAScript used to have a
> ByteArray, but it was removed for now, but reviving that proposal
> seems like the right thing to do. For now I had stayed away from this
> while waiting for a ByteArray primitive to be defined elsewhere, and
> for now simply stick with strings in the File API.
>
> In general I think your proposal is an interesting idea though so I'm
> very interested in hearing input from elsewhere. Progress events I
> assume would be done by firing events on the stream object. However
> should they fire on every stream? I.e. if the author did:
>
> fileStream = new FileInputStream(file);
> binaryStream = new BinaryStringInputStream(fileStream);
> fileStream.onprogress = handler1;
> binaryStream.onprogress = handler2;
> binaryStream.read(handler);
>
> Does that mean that both handler1 and handler2 is called for each
> progress event?

Progress events are fired for every read operation, and not for every  
stream. This should remove any confusion about what is the object of  
the progress event.

>
> Also, what happens if you attempt to read from both fileStream and
> binaryStream at the same time?

Operating systems allow you to open any number of blocks of a file  
concurrently. Therefore, the usage you are suggesting will not cause  
any error or failure. The fileStream read will proceed at its own rate  
and the binaryStream at its own rate. Both are free to choose their  
own offset and lengths.

Nikunj
http://o-micron.blogspot.com
Received on Wednesday, 19 August 2009 19:36:29 UTC