Re: [FileAPI] No streaming semantics? from Charles Pritchard on 2012-02-06 (public-webapps@w3.org from January to March 2012)

From: Charles Pritchard <chuck@jumis.com>
Date: Sun, 5 Feb 2012 19:02:24 -0800
To: Justin Summerlin <jmerlin@jmerlin.net>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-Id: <A86A2E13-591D-4AE6-B56E-3CBA9130FBC1@jumis.com>

Use slice; webkitSlice.

They just it themselves put together on the media Apis as well. So that's cool. There's an append stream semantic.

-Charles



On Feb 5, 2012, at 5:18 PM, Justin Summerlin <jmerlin@jmerlin.net> wrote:

> I've been playing around with the FileAPI in both Chrome and Firefox.  The API presented works very well for small files consistent with perhaps circa 1990s web usage.  Previewing small images (~50K), reading and processing small files ("a few" MB) have been cases where uniformly strong results are seen and the proposed API appears to greatly enhance the capability of client-oriented JS-enabled websites.
> 
> Upon considering the implications for local file processing, it became apparent to me that the repercussions for client-side filtering and aggregation can be a potentially huge thing for the internet.  One simple case study demonstrates two-fold inadequacies in the presented File API for very commonly used semantics.
> 
> Consider a web application designed to process user-submitted log files to perform analytics and diagnose problems.  Perhaps these log files can typically be 50GB in size.  Two cases are interesting:
> 
> 1. The application scans through the log file looking for errors up to some maximum number, then reports those to a server-side script.
> 2. The application watches the log file and actively collects information on errors to recommend diagnostics (in this case, no round-trip may be necessary).
> 
> The reason the first case cannot be implemented with the present API is that readAs* in FileReader reads the *entire* file into memory, firing progress events along the way.  It is consistent that both Chrome and Firefox implementations attempt to do this and then fail due to insufficient memory.  The reason the second case is impractical is that one must re-read the entire file into memory each time to see any changes in a file, which is problematic at best.
> 
> Unless I'm missing something (I don't believe that I am), the capability of streaming which would solve both of these problems in a very effective way, is not present in the FileAPI.  Perhaps in addition to readAs*, both seek and read[Text|BinaryString|ArrayBuffer](<blob/file>, <length>[, <encoding>]).  Additionally, in an asynchronous manner, the result is presented in an event:
> 
> function processFile(file, reader) {
>   reader.onread = function (ev) {
>     if (has more...) {
>       reader.readText(file, 4096);
>     }  else {
>       reader.onread = null;
>     }
>     // Process chunk...
>   }
>   reader.readText(file, 4096);
> }
> 
> And in the case of reading more data from a file as it's written to, one would simply keep attempting a read and if the read returns no data, do nothing.
> 
> Is this intended and if so, is any streaming semantic to be considered in future JavaScript API considerations?
> 
> Thanks,
> 
> Justin

Received on Monday, 6 February 2012 03:02:54 UTC