[FileAPI] No streaming semantics?

I've been playing around with the FileAPI in both Chrome and Firefox.  The
API presented works very well for small files consistent with perhaps circa
1990s web usage.  Previewing small images (~50K), reading and processing
small files ("a few" MB) have been cases where uniformly strong results are
seen and the proposed API appears to greatly enhance the capability of
client-oriented JS-enabled websites.

Upon considering the implications for local file processing, it became
apparent to me that the repercussions for client-side filtering and
aggregation can be a potentially huge thing for the internet.  One simple
case study demonstrates two-fold inadequacies in the presented File API for
very commonly used semantics.

Consider a web application designed to process user-submitted log files to
perform analytics and diagnose problems.  Perhaps these log files can
typically be 50GB in size.  Two cases are interesting:

1. The application scans through the log file looking for errors up to some
maximum number, then reports those to a server-side script.
2. The application watches the log file and actively collects information
on errors to recommend diagnostics (in this case, no round-trip may be
necessary).

The reason the first case cannot be implemented with the present API is
that readAs* in FileReader reads the *entire* file into memory, firing
progress events along the way.  It is consistent that both Chrome and
Firefox implementations attempt to do this and then fail due to
insufficient memory.  The reason the second case is impractical is that one
must re-read the entire file into memory each time to see any changes in a
file, which is problematic at best.

Unless I'm missing something (I don't believe that I am), the capability of
streaming which would solve both of these problems in a very effective way,
is not present in the FileAPI.  Perhaps in addition to readAs*, both seek
and read[Text|BinaryString|ArrayBuffer](<blob/file>, <length>[,
<encoding>]).  Additionally, in an asynchronous manner, the result is
presented in an event:

function processFile(file, reader) {
  reader.onread = function (ev) {
    if (has more...) {
      reader.readText(file, 4096);
    }  else {
      reader.onread = null;
    }
    // Process chunk...
  }
  reader.readText(file, 4096);
}

And in the case of reading more data from a file as it's written to, one
would simply keep attempting a read and if the read returns no data, do
nothing.

Is this intended and if so, is any streaming semantic to be considered in
future JavaScript API considerations?

Thanks,

Justin

Received on Monday, 6 February 2012 02:53:30 UTC