Re: Sandboxed Filesystem use cases? (was Re: Moving File API: Directories and System API to Note track?) from Eric U on 2012-09-26 (public-webapps@w3.org from July to September 2012)

From: Eric U <ericu@google.com>
Date: Wed, 26 Sep 2012 16:23:50 -0700
To: Maciej Stachowiak <mjs@apple.com>
Cc: James Graham <jgraham@opera.com>, Brendan Eich <brendan@mozilla.org>, Jonas Sicking <jonas@sicking.cc>, olli@pettay.fi, public-webapps@w3.org
Message-ID: <CAHvSExc5r6w1ovCjYvOuw1JYrwQiZ6MZroauyxsJL53rR_p6Og@mail.gmail.com>
Asking about use cases that can be served by a filesystem API, but not
by IDB, is reasonable [and I'll respond to it below], but it misses a
lot of the point.  The users I've talked to like the FS API because
it's a simple interface that everyone already understands, that's
powerful enough to handle a huge variety of use cases.

Sure, the async API makes it a bit more complicated.  Every API that
handles large data is stuck with the same overhead there.  But
underneath that, people know what to expect from it and can figure it
out very quickly.

You just need to store 100KB?
  1) Request a filesystem.
  2) Open a file.
  3) Write your data.

Need a URL for that?  Sure, it's just a file, so obviously that works.

Want it organized in directories just like your server or dev environment?
Go ahead.

You don't have to write SQL queries, learn how to organize data into
noSQL tables, or deal with version change transactions.

If you want to see what's in your data store, you don't need to write
a viewer to dump your tables; you just go to the URL of any directory
in your store and browse around.  Our URLs have a natural structure
that matches the directory tree.  If you add URLs to IDB, with its
free-form key/value arrangement, I don't forsee an immediate natural
mapping that doesn't involve lots of escaping, ugly URLs, and/or
limitations.

On to the use cases:

Things that work well in a sandboxed filesystem that don't work well
in IDB [or any of the other current storage APIs] are those that
involve nontransactional modifications of large blobs of data.  For
example, video/photo/audio editing, which involve data that's too big
to store lots of extra copies of for rollback of failed transactions,
and which you don't necessarily want to try to fit into memory.
Overwriting just the ID3 tag of an MP3, or just the comment section of
the EXIF in a JPEG, would be much more efficient via a filesystem
interface.  Larger series of modifications to those files, which you
don't want to hold in memory, would be similar.

I know Jonas wants to bolt nontransactional data onto the side of IDB
via FileHandle, but I think that the cure there is far worse than the
disease, and I don't think anyone at Google likes that proposal.  I
haven't polled everyone, but that's the impression I get.

Beyond individual use cases:

When looking at use cases for a filesystem API, people often want to
separate the sandboxed cases and the non-sandboxed cases ["My Photos",
etc.].  It's also worthwhile to look at the added value of having a
single API that works for both cases.  You have a photo organizer that
works in the sandbox with downloaded files?  If your browser supports
external filesystems, you can adapt your code to run in either place
with a very small change [mainly dealing with paths that aren't legal
on the local system].  If you're using IDB in the sandbox, and have a
different API to expose media directories, you've got to start over,
and then you have to maintain both systems.

One added API?

It's pretty clear that people see the value of an API that lets one
access "My Photos" from the web.  That API is necessarily going to
cope with files and directories on some platforms, even if others
don't expose directories as such.  If we're going to need to add a
filesystem API of some kind to deal with that, also using the same API
to manage a sandboxed storage area seems like a very small addition to
the web platform, unlike the other storage APIs we've added in the
past.


Regarding your final note:  I'm not sure what you're talking about
with BlobBuilder; is that the EXIF overwrite case you're trying to
handle?  If so, File[Handle|Writer] with BlobBuilder and seek seems to
handle it better than anything else.

	Eric

On Tue, Sep 25, 2012 at 11:57 AM, Maciej Stachowiak <mjs@apple.com> wrote:
>
> On Sep 25, 2012, at 10:20 AM, James Graham <jgraham@opera.com> wrote:
>
>>
>> In addition, this would be the fourth storage API that we have tried to introduce to the platform in 5 years (localStorage, WebSQL, IndexedDB being the other three), and the fifth in total. Of the four APIs excluding this one, one has failed over interoperability concerns (WebSQL), one has significant performance issues and is discouraged from production use (localStorage) and one suffers from a significant problems due to its legacy design (cookies). The remaining API (IndexedDB) has not yet achieved widespread use. It seems to me that we don't have a great track record in this area, and rushing to add yet another API probably isn't wise. I would rather see JS-level implementations of a filesystem-like API on top of IndexedDB in order to work out the kinks without creating a legacy that has to be maintained for back-compat than native implementations at this time.
>
> I share your concerns about adding yet-another-storage API. (Although I believe there are major websites that have adopted or are in the process of adopting IndexedDB). I like my version better than the Google one, too, but I also worry about whether we should be adding another storage API at all.
>
> I think we need to go back to the use case for sandboxed filesystem storage and understand which use cases cannot be served with IndexedDB.
>
>
> Here are some use cases I have heard:
>
> (1) A webapp (possibly working on offline mode) wants to stage files for later upload (e.g. via XHR).
>     Requirements:
>         - Must be able to store distinct named items containing arbitrary binary data.
>         - Must be able to read the data back for later upload.
>         - Must be able to delete items.
>
> (2) A web-based mail client wants to download the user's attachments locally, then reference them by URL from the email and allow them to be extracted into the user's filesystem space.
>     Requirements:
>         - Must be able to store distinct named items containing arbitrary binary data.
>         - Must be able to reference items by persistent  URL from constructs in a webpage that use URLs.
>         - Must be able to delete items.
>
> (3) A web-based web developer tool downloads copies of all the resources of a webpage, lets the user edit the webpage live potentially adding new resources, and then uploads it all again to one or more servers.
>     Requirements:
>         - Must be able to store distinct named items containing arbitrary binary data.
>         - Must be able to replace items.
>         - Must be able to reference items by persistent  URL from constructs in a webpage that use URLs.
>         - Must be able to delete items.
>         - Must be able to enumerate items.
>     Highly desirable:
>         - Hierarchical namespace.
>
> (4) A game wants to download game resources locally for efficient operation, and later update them
>     Requirements:
>         - Must be able to store distinct named items containing arbitrary binary data.
>         - Must be able to replace items.
>         - Must be able to reference items by persistent URL from constructs in a webpage that use URLs.
>         - Must be able to delete items.
>     Highly desirable:
>         - Hierarchical namespace.
>
>
> I believe the only requirement here that is not met by IndexedDB is:
>     - The ability to reference an item by persistent URL.
>
> IndexedDB has enumeration, hierarchical namespace, ability to add, replace, remove, get, etc.
>
>
> Are there other use cases? In particular, are there use cases that justify a whole new storage API instead of adding this one feature to IndexedDB?
>
>
> Note: one aspect of the MinimalFileSystem proposal that is not obviously required by any of these use cases is the ability to incrementally update a file (beyond what you could already do with slice() and BlobBuilder). Basically the whole FileHandle interface. Is there truly a use case that you can't satisfy by using BlobBuilder to make your update and then atomically replacing?
>
>
> Regards,
> Maciej
>
>
Received on Wednesday, 26 September 2012 23:24:33 UTC