Next-generation file API use cases

Howdy, folks.

I'm a new guy on Google's Chrome team, having just moved
over from O3D.  I'm interested in talking about the stuff
that's not going to make it into the current iteration of
the file API you've been discussing.  Following Arun's
suggestion [1], I thought I'd post some use cases to start
things off.  I've taken some from the list archives and
others from discussions I've had off-list.

I'm assuming here that before any of this gets implemented,
we'll have an API that lets one:
 * select and load files via user intervention.
 * slice out a subrange of a file.
 * give a handle to a local file (perhaps as a URN produced
   by a File) to the image tag, the video tag, XHR, etc.

I've broken the following list into two sections, by
requirements.

Group 1

Persistent uploader
  * When a file's selected for upload, it copies it into a
    local sandbox and uploads a chunk at a time.
  * It can restart uploads after browser crashes, network
    interruptions, etc.
  * [Optional extension] The user may select an entire
    directory of files in a single operation.

Video game or other app with lots of media assets [2][3][4]
  * It downloads one or several large tarballs, and
    expands them locally into a directory structure.
  * The same download should work on any operating system.
  * It can manage prefetching just the next-to-be-needed
    assets in the background, so going to the next game
    level or activating a new feature doesn't require
    waiting for a download.
  * It uses those assets directly from its local cache, by
    direct file reads or by handing local URIs to image or
    video tags, O3D or WebGL asset loaders, etc.
  * The files may be of arbitrary binary format.
  * On the server side, a compressed tarball will often be
    much smaller than a tarball of separately-compressed
    files.  Also, 1 tarball instead of 1000 little files
    will involve fewer seeks, all else being equal.

Audio editor with offline access or local cache for speed
  * See Aviary's Myna [5] for an example of this being done
    in Flash.
  * The data blobs are potentially quite large, and are
    read-write.
  * It may want to do partial writes to files (ovewriting
    just the ID3 tags, for example).
  * The ability to organize project files by creating
    directories would be useful.

Offline video viewer
  * It downloads large files (>1GB) for later viewing.
  * It needs efficient seek + streaming.
  * It must be able to hand a file handle of some sort to
    the video tag.
  * It should enable access to partly-downloaded files e.g.
    to let you watch the first episode of the DVD even if
    your download didn't complete before you got on the
    plane.
  * It should be able to pull a single episode out of the
    middle of a download and give just that to the video
    tag.

Offline GMail
  * Downloads attachments and stores them locally.
  * Caches user-selected attachments for later upload.
  * Needs to be able to refer to cached attachments and
   image thumbnails for display and upload.
  * Should be able to trigger the UA's download manager
   just as if talking to a server
   [Content-Disposition: attachment].
  * Wants to upload an email with attachments as a
   multipart post, rather than sending a file at a time in
   an XHR.

Group 2

Client-side editor for non-sandboxed files
  * In order to have a save function (not just save-as), it
    will require persistent privileges to write to selected
    non-sandboxed files.

Photo organizer/uploader
  * It monitors e.g. your "My Photos" directory and
    subdirectories and automatically processes/uploads new
    additions.
  * It needs persistent read access to an entire directory
    tree outside the sandbox.
  * It can restart uploads without having to make a local
    copy of each file, as long as it's OK just to start
    over with the new version if an in-progress file
    changes.

Group 1 all require writing to the disk, need no access to
files outside of a private per-origin sandbox (except where
that's satisfied by the API you're already working on, with
a few small extensions), and are all hard to build efficiently
on top of a database, key-value store, or AppCache, due
to their manipulations of large blobs.

While you could break any large dataset into chunks and
string them together as rows in a database, it's a pain to
do anything with them.  You'd end up needing to implement a
file abstraction in JavaScript, and if everyone's going to
do that anyway, I think that it's better to standardize it
and make it efficient by leveraging the host platform.  We
won't necessarily want to expose all the capabilities of the
local filesystem (e.g. atime, chmod, etc.) but a simple
subset would go a long way.

Also, by using the native filesystem, we help keep the
browser from being a silo.  Users can copy their data out
easily, allow iTunes or Pandora to index and play music
produced by a web app, etc.

Group 2, in addition to the requirements of group 1, will
need persistent access to files or directories outside the
sandbox.  I think the first bunch are probably enough for
one discussion.   There are enough security, quota, and
usability issues there to keep us busy for a while, but
they're not nearly as bad as the second group.  Group 2
require capabilities that bring up much more complex
concerns, so it'll probably be a lot easier to make progress
if we leave them for future expansions.

I look forward to seeing your use cases, and hearing what
you think of these.

   Eric Uhrhane
   ericu@google.com

[1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0909.html
[2] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-July/021586.html
[3] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0460.html
[4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0471.html
[5] http://aviary.com/blog/posts/aviary-release-myna-audio-editor-music-remixer

Received on Thursday, 29 October 2009 20:42:04 UTC