Re: Request for feedback: Filesystem API from Jonas Sicking on 2013-08-13 (public-script-coord@w3.org from July to September 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 13 Aug 2013 11:34:45 -0700
To: David Bruant <bruant.d@gmail.com>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CA+c2ei9TtoVxfBwSG-dYd5PRie4KhVjGLGCCNKoNnVFJAQzOEA@mail.gmail.com>
On Sat, Aug 10, 2013 at 2:54 PM, David Bruant <bruant.d@gmail.com> wrote:
> I'd like to share an experience in working with Tizen. In their doc, they
> list the different ways to store data [2]. Aside from the obselete WebSQL,
> we have:
> - localStorage
> - IndexedDB
> - FileSystem (the Webkit one)
> (they put app cache in this category, but I don't really see it as a storage
> mechanism)
>
> It left me thinking that it's a lot of different ways to store information,
> but they serve different purposes: localStorage is
> key(string)/value(string), IndexedDB is useful to store more data
> structures. But what is FileSystem?
> To a first approximation, a FileSystem is a key(string)/value(binary data)
> storage system (but the fine grain access and async to the value makes it
> better than localStorage when that matters). Keys are strings (where '/' has
> a particular semantics). In OS FileSystems, this first approximation is
> wrong because specific directories can have different rights (rwx) assigned.
> But this isn't a feature web apps needs (at least I haven't seen this need
> expressed when it comes to data storage).
>
> In the Tizen application, I wrote an abstraction on top of the FileSystem to
> make it an async key(string)/value(Blob) storage, because that's what we
> really needed (it had the same interface than the async abstraction of
> key/value storage I wrote on top of localStorage, which was awesome)

It's important to keep in mind that if you want simple key/value
storage, where the values happen to be Blobs, IndexedDB works
perfectly fine for this. At least as far as you think that IndexedDB
works at all.

The use cases for an explicit Filesystem API which IndexedDB doesn't
fulfill are:
* Being a filesystem API. A lot of developers are familiar with
filesystems and want an API that is exactly that. For me personally,
this is the main reason we are doing a sandboxed filesystem.
* The filesystem: URL scheme. I.e. a URL scheme which allows reading
directly from the storage area without having to first asynchronously
load a Blob and then use URL.createObjectURL(). This is actually
solvable for IndexedDB [1], but it's not something we've built yet.
* Support for modifyable Blobs. I.e. support modifying 10 bytes in the
middle of a 1GB file, without requiring regenerating the full 1GB
file. This is actually also solvable in IndexedDB. We have even added
experimental support for this in Firefox. But Google is strongly
opposed to adding this to the IndexedDB standard with the argument
that IndexedDB (and other databases) are transactional, whereas a
file-modify API needs to not be transactional (no one wants to support
rolling back a 1GB file write). Putting both transactional and
non-transactional APIs in the same storage area has been deemed too
confusing by Google.

So we should keep in mind that the main use case for a sandboxed
filesystem API is literally "being a filesystem API". Which means that
that is what we should optimize for.

That said, that doesn't mean that we have to have a Directory object.
For example the Node.js API doesn't have a Directory object, but they
still enable directory manipulation. But posix does have something
similar to Directory objects.

[1] http://lists.w3.org/Archives/Public/public-webapps/2013JulSep/0081.html

> There are lots of point above (dealing with relative '..' paths,
> intermediate directories, making sure a file is within a directory subtree,
> etc.) that relate to Directory and that would plain disappear if the
> Directory abstraction was removed. Since it's not really needed from the
> data storage perspective, I'd be in favor of removing it.
>
> One argument in favor of Directory I have read is about handing off only a
> directory (instead of the whole filesystem) to partially trusted code, but
> that could be solved if the FileSystem interface provides something like:
>     var prefixedFileSystem = fs.createPrefixedSubFileSystem(prefix);
> Worst case, this is something that can be easily implemented as a library. I
> don't think we need a Directory abstration and all the complications that
> come along to solve that particular use case. The people who need this sort
> of compartimentation will figure it out.

I wouldn't say that there are "a lot" of issues that are complicated
by Directory objects. I think you basically covered all of them :-).
But yes, the API certainly would be simplified without Directory
objects.

But keep in mind the main use case as described above.

>> However it is expected that this API will eventually also be used for
>> accessing real filesystems eventually
>
> In my demonstration above, I considered the file system purely from the data
> storage persective and concluded that the Directory abstraction isn't
> necessary.
> But "accessing real filesystems" is a very different use case than data
> storage. Interacting with a real filesystem means (or can mean, depending on
> the level of granularity you want to go to) taking care of things like
> per-directory rights, etc.

To be clear, some browser vendors have expressed that "accessing real
filesystem" is not a use case that they are interested in supporting.
Others are already using filesystem APIs to expose real filesystems,
but they aren't doing so in traditional webpage contexts.

So I don't think it's strictly needed to have the ability to expose
real filesystems, and we probably shouldn't do it if it comes at a
great expense. But if we can do so, that will make some browser
vendors more eager to adopt it.

/ Jonas
Received on Tuesday, 13 August 2013 18:35:42 UTC