Re: Alternative File API from Nikunj R. Mehta on 2009-08-19 (public-webapps@w3.org from July to September 2009)

From: Nikunj R. Mehta <nikunj.mehta@oracle.com>
Date: Wed, 19 Aug 2009 12:08:07 -0700
To: arun@mozilla.com
Cc: Jonas Sicking <jonas@sicking.cc>, Webapps WG <public-webapps@w3.org>
Message-Id: <8F20C67D-FF4E-4A41-90E5-7DDF136AE5BF@oracle.com>
Hi Arun,

Thanks for pulling together all those references and sharing your  
research conclusions in such painstaking details. I really appreciate  
your hard work. I was particularly interested in seeing whether you  
had included the Java I/O API design in your work. Evidently, you did,  
but it had no influence on the result, as can be seen in the new  
proposal I have circulated.

I welcome you to reconsider this aspect of your design since it will  
make the File API vastly more extensible and solve two major concerns:  
concurrent file access and incremental processing.

Some more comments inline below.

Nikunj
http://o-micron.blogspot.com


On Aug 18, 2009, at 6:04 PM, Arun Ranganathan wrote:

> Nikunj,
>
>>>>
>>> While the above API does have the advantages that we agree come  
>>> with a model that stems from EventTarget and events, I'm concerned  
>>> that we've complicated the API for an edge case.  I *do* agree  
>>> that progress events are desirable, especially given leaky  
>>> abstractions for file systems (e.g. the user plugs in a networked  
>>> drive, which is surfaced from the input type="file" picker) which  
>>> could behave slowly.  But this seems like a desirable edge case  
>>> which we should find another solution for, and not overhaul the  
>>> entire API.
>>
>> Do we need asynchronous APIs if files are local and file system  
>> access is fast? If we do, then why do we not also need progress  
>> events?
> Note that I say that I *do* agree that progress feedback is useful.   
> However, I'm not convinced that they will always be used, since in  
> *most* cases, file access should be fast.  The discussion here is  
> *how best* to integrate progress feedback, not *if* we should  
> integrate it. I describe progress feedback for making file data  
> available to read as an edge case because I think that in most cases  
> we won't need it.  But again, I accept that it is useful.

OK. Now I get it - progress events should be independent of  
incremental processing. I like that approach.

>> It seems that the whole WebApps WG has accepted the "desirable edge  
>> case" of dealing with system delays such as in SQL databases and  
>> file systems through the use of asynchronous APIs.
>>
> Yes, but that is different than giving feedback about progress!   
> Asynchronous APIs are desirable in general, so as not to block in  
> the main thread.  The existing file read mechanism in Firefox today  
> (which is non-standard!) is, in fact, synchronous [1].   
> Standardizing this kind of an API was a non-starter (but it is  
> used), although the present "TR" of the File specification (which I  
> think should be obsolete) still stipulates synchronous reads [2].

I agree, synchronous APIs will be painful to users when some disk  
access is involved.

>>> In fact, progress events for file APIs seem pretty sugary in  
>>> general; many of the other platforms I've looked at for File APIs  
>>> don't have them.
>>
>> Please cite the platforms you have researched.
> Check out :
>
> 1. Silverlight OpenFileDialog API (API reference with SDK download): http://www.microsoft.com/video/en/us/details/4f14da66-e263-4ef2-8d42-f90dc4c00384
>
> "File reading" doesn't generate its own progress events, but you  
> could use 'bytes read' for progress feedback, especially over  
> network upload scenarios, and use asynchronous callbacks.  Again,  
> this isn't the same as a dedicated Progress Event; Silverlight  
> developers should correct me if I'm off here.  To get the API  
> reference, you may have to download the SDK (Windows only AFAICT).   
> I'd like to get more feedback from developers who use Silverlight  
> about what they'd like from a File API for the web.
>
> Google Gears:
>
> 2. http://code.google.com/apis/gears/api_desktop.html#File
> 3. http://code.google.com/apis/gears/api_blob.html
>
> Stuff like desktop.openFiles(callback) follows callback mechanisms  
> similar to what's in the existing API, but without dedicated  
> asynchronous accessors.  You can get the File's data as a Blob.  But  
> if you want ProgressEvent, you can get it using HttpRequest (http://code.google.com/apis/gears/api_httprequest.html 
>  and http://code.google.com/apis/gears/api_httprequest.html#ProgressEvent) 
> , but only for upload to a server.   I'd like to get more feedback  
> from developers who use Gears regularly.
>
> Then, there's Java File I/O, which has been modified by JSRs at  
> least twice (JSR51, and JSR203).  Each of these evolved new  
> capabilities, including asynchronous I/O and polling (non-blocking)  
> reads.  In particular, check out:
>
> 4. http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html
> 5. http://java.sun.com/j2se/1.4.2/docs/api/java/io/ 
> FileInputStream.html
> 6. http://java.sun.com/javase/6/docs/api/javax/swing/ProgressMonitorInputStream.html
>
> Here, you *can* get progress updates using  
> ProgressMonitorInputStream (6. above) with the FileInputStream (5.  
> above); this is relatively easy.  Or, you could monitor byte updates  
> on the FileChannel.  This platform gives you optional progress  
> feedback, e.g. for large files.
>
> But the most compelling case is Flash (Flex AS3 ref):
>
> 7. http://livedocs.adobe.com/flex/3/langref/
> 8. In particular: http://livedocs.adobe.com/flex/3/langref/flash/filesystem/FileStream.html
> 9. Also: http://livedocs.adobe.com/flex/3/langref/flash/filesystem/FileStream.html#event 
> :progress
>
> FileStream uses progress events, and you can listen for these  
> (doesn't bubble) when dealing with large files (bytesLoaded,  
> bytesTotal, etc. -- see 9. above).  We've discussed the fact that  
> GMail uses Flash for file access scenarios on this listserv and  
> *falls back* to <input type="file"> if Flash is disabled or simply  
> not on the system [3].  In that discussion [3], the use case for  
> progress events is: "file upload progress" which we get with XHR and  
> the File API, even as currently written.  We can also slice( )  
> files.  Again, I'm in favor of progress feedback, but I stick to my  
> guns when I assert that they aren't that important for *file  
> reads* :-)
>
> Flash has progress events for File reads, however, and so should the  
> web.  Again, to date, this wasn't cited as the reason why Flash is  
> used [3], at least for GMail.  This discussion isn't about whether  
> or not we should have progress detection ability for file reads.  In  
> general, I'd like to get more feedback than what we got with [3]  
> from developers who use Flash about what they'd like from a File API  
> for the web.
>
> Finally, there's Adobe's JavaScript API for Flash extensions on the  
> authoring platform:
>
> 10. http://www.adobe.com/devnet/flash/articles/jsapi.html
>
> This has no progress detection capability AFAICT, but I think you  
> can add event handling using other mechanisms.
>
> So to summarize: I am amenable to progress feedback in the File API  
> on reads; when discussing what mechanism to do this is best, the  
> sense I got is that the "alternative API" proposal [4] (or something  
> like it) was deemed "a more correct API" [5] than adding a callback  
> to the existing draft.  The draft should absolutely change to be  
> "more correct" but my concerns about simplicity aren't going away :-)

For the record, I don't favor the alternative API proposal since it  
introduces concepts that are completely foreign to file processing. I  
have expressed my dislike of this proposal separately.

>>
>>> That's not to say that the web shouldn't have it -- I'm just  
>>> pointing out that I think most users of the API will simply call  
>>> the API to get the file.  And, I think that in the lion's share of  
>>> use cases, things will behave rapidly enough to not warrant the  
>>> use of progress events (except during the networked/plugged in  
>>> scenarios).
>>
>> or for that matter asynchronous callbacks? Why do you think users  
>> will want asynchronous callbacks in the "lion's share of use cases"?
> I assert that the lion's share of use cases will simply want to get  
> the file (without progress events), do something, and then upload  
> the file as efficiently as possible (with progress events).  This is  
> what I mean by the lion's share of use cases.  I accept that network  
> drives and plugged in devices are a good use case for progress  
> events, but do not think they constitute a majority use case.  Do  
> you disagree?

No way. Even accessing files from an attached Flash device can be  
slow, so we have to account for that in the API.

>
> >>Honestly, I don't like to use events for file access.
>
> We *could* also have something like FileStream (as Flash does) or  
> FileInputStream (as Java does).

I look forward to comments on my proposal.

> One reason to not do that currently is we lack primitives for bytes  
> or byte arrays in JavaScript.  This could change over the course of  
> time with subsequent versions of the ECMAScript standard.  It's been  
> pointed out before that asynchronous callbacks on the event loop or  
> event callbacks both lead to asynchronous access to a file's contents.

What about Gear's Blob interface and its use of an integer array to  
deal with bytes of a Blob? Can you explain why that would be a poor  
choice?

>  One advantage of the alternative API [4] is that it resembles what  
> XHR does.

I hardly see this as an advantage. Au contraire, I see it as an  
unwanted baggage.

>
> >>I don't know of another programming library that does [use events  
> for file access].
>
> This is generally what I found, yes.  Aaron points out that the web  
> platform is inconsistent anyway [6] which I agree with :)  My  
> initial draft did not use events for reads.
> >>However, there needs to be a way to separate the reading of a file  
> from the file itself. Properties of a file such as its length as  
> well as a temporary URI belong on the file.
>
> I think the alternative API [4] reflects this separation of reading  
> a file from the file itself; I think "size" is an attribute that  
> should be on Data (or FileData).   File, which inherits from Data  
> (or FileData), should have the temporary URL as an attribute.

Does it? The alternative API doesn't actually define any attributes of  
File, so I am assuming that it only has name and mediaType, and not  
size or url.

> -- A*
>
> [1] https://developer.mozilla.org/En/NsIDOMFile
> [2] http://www.w3.org/TR/file-upload/
> [3] http://lists.w3.org/Archives/Public/public-webapps/2009AprJun/1110.html
> [4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0565.html
> [5] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0664.html
> [6] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0685.html
Received on Wednesday, 19 August 2009 19:11:00 UTC