Re: Request for feedback: Filesystem API from Jonas Sicking on 2013-08-10 (public-script-coord@w3.org from July to September 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 9 Aug 2013 23:24:02 -0700
To: Domenic Denicola <domenic@domenicdenicola.com>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CA+c2ei9RH1GDDri9rtSyfkMdFkZbB-gzStEG9QU31PAiJyA2fQ@mail.gmail.com>
On Fri, Aug 9, 2013 at 7:25 PM, Domenic Denicola
<domenic@domenicdenicola.com> wrote:
> Jonas, I think this looks great for the most part, and am excited about the broad strokes of the API: promise-based, fairly simple interface, no POSIX-like cryptic names (e.g. "unlink"), and I especially appreciate you guys taking the time to think through the multi-process stuff so that those of us who aren't familiar with such things don't have to do so.
>
> However, I think one major issue with the interface is that it is about as far as a proposal possibly could be from embracing the extensible web principles that we've recently been pushing [1]. A filesystem API seems like the *perfect* candidate for applying the "add new low-level capabilities, allow web developers to iterate on higher-level ones, then standardize later" approach.
>
> For example:
>
> - At a glance, move, copy, removeDeep, and enumerateDeep seem like derivative APIs.

Like Brendan points out, what is considered the "low-level
capabilities" isn't always obvious.

The entire filesystem API is really just an abstraction over a block
storage medium combined with a locking mechanism. The locking
mechanism is something that we need either way, help would be greatly
appreciated with that. But a basic API would look something like this:

interface BlockStorage {
  Promise<void> write(unsigned long long start, ArrayBufferView data);
  Promise<ArrayBuffer> read(unsigned long long start, unsigned long long end);
  Promise<void> setSize(unsigned long long size);

  // This probably needs to be able to flush particular ranges
  Promise<void> flush();
};

interface AsyncLock {
  static AbortablePromise<AsyncLock> get(DOMString lockName);
  release();
};

That's basically it. That is all the primitives you need to implement
either a Filesystem API, IndexedDB, AsyncLocalStorage or any other
disk-based client-side storage mechanism you can imagine.

We could throw in something like this if we wanted to add ability to
build something like inotify:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=22628#c15

I'm ok if we want to go this way instead. But I'll note that we have
gotten a lot of requests for an actual filesystem API. Most of those
people have not liked the answer that "IndexedDB is a primitive, you
can build a filesystem on top of that using a library". So I suspect
the above approach would meet similar criticism.

But if we want to go quite that low, and actually provide a full
filesystem API, then we can certainly look at if the current proposal
contains too much syntax sugar.

move:
I don't agree that move is a derivative. Yes, you can implement it by
doing copy-then-delete, but performance would be so dramatically
different that I wouldn't call it an acceptable alternative. Try
moving a 1GB by copying and then deleting and it'll go from instant to
user-risks-falling-asleep. Not to mention that you might end up
running out of disk space.

copy:
I agree that this is syntax sugar. Which is why I listed "should we
remove copy" as one of the open issues. But note that it's not trivial
to implement given that simply copying files one-by-one is not atomic.
So you'd have to use something like AsyncLock above to help with the
coordination.

remove/removeDeep:
remove() and removeDeep() can actually both be implemented using each
other. removeDeep() can be implemented by recursively calling remove()
on every subfolder, if any, in the selected target. However remove()
can be implemented by first checking if the selected target is empty
and producing an error if it's not, and calling removeDeep() if the
target is empty or is a file.

If we're only keeping one I would prefer to keep removeDeep(). Both
strategies requires using AsyncLock in order to ensure atomic
operations, but performance of implementing remove() using
removeDeep() is likely better. And I'd argue that remove() us useful
much more rarely.

So yes, I agree that remove() is syntax sugar. Which is why I again
listed "should we get rid of remove" in the open issues.

enumerateDeep:
I agree this is syntax sugar. But asynchronous recursive algorithms
are not trivial to implement. And deep enumeration is useful in a lot
of cases like synchronizing a tree to the server, implementing copy,
getting total filesize of a directory, finding all image files in a
directory etc.

So I'd still argue for keeping this.

> - The readText method of the FileHandle, which has design problems in terms of not taking into account the stateful nature of encoding-specific text reading anyway, is probably best built by providing a lower-level primitive like a string-decoder duplex stream or transform, instead of baking that into the file handle stream itself. Of course one of the benefits of this would be that such string decoding capabilities would be usable by other parts of the platform as well! #extendthewebforward

I generally think that we should match Stream here. If that means
leaving out readText for now then that's ok. I agree that's likely the
way we'll go. But FWIW, no one is ignoring the encoding issues.

> - The AbortableProgressPromise idea is way overkill. As Isaac so clearly outlined, it makes no sense for streaming APIs generally. That leaves it for use in move and copy, both of which are derivative operations. As such I think user-space implementations will be in a better position to compose the file stream operations into a cancellable and/or progress-emitting implementation of move and copy. That way we can have an API that users actually like, instead of having the AbortableProgressPromise idea---which the promise community is generally negative toward---be shoved at us by the W3C.

Given that Isaac's email was sent a few hours ago, and that it seems
based on misunderstandings of my original proposal (not sure if it's
his misunderstandings or Takeshi's) I wouldn't considered this
discussion settled. I'll respond over there though.

> And this isn't even getting into the streaming API in any detail.

I definitely think we should get into the streaming API in detail. But
we should do that separate from the filesystem API I think.

> I'm not sure if you disagree with the extensible web principles, or if it's just been a matter of not taking the time to apply them in this case, but I'd urge a long and hard look at the proposal with those in mind.

I'm most certainly aware of the extensible web principles. But as you
can see above it's not always obvious how to apply them. I'd be
interested to hear if you think we should go with the BlockStorage
approach or not?

/ Jonas

> [1]: http://extensiblewebmanifesto.org/
>
>> -----Original Message-----
>> From: Jonas Sicking [mailto:jonas@sicking.cc]
>> Sent: Friday, August 9, 2013 17:03
>> To: public-script-coord@w3.org
>> Subject: Request for feedback: Filesystem API
>>
>> Over the past few months a few of us at mozilla, with input from a lot of
>> other people, has been iterating on a filesystem API. The goal of this
>> filesystem API is first and foremost to expose a sandboxed filesystem to
>> webpages. This filesystem would be origin-specific and would not allow
>> accessing the user's OS filesystem. This avoids a lot of the security concerns
>> around filesystem APIs.
>>
>> However it is expected that this API will eventually also be used for accessing
>> real filesystems eventually, but there are a lot of security concerns that
>> needs to be solved before we can create a real standard for that. Hence that
>> is not the topic of this email.
>>
>> API summary:
>>
>> The proposed API introduces two new abstractions: A Directory object which
>> allows manipulating files and directories within it, and a FileHandle object
>> which allows holding an exclusive lock on a file while performing multiple
>> read/write operations on it.
>>
>> The API intentionally reuses the already existing File abstraction as defined
>> by [1] as we didn't want to have two different primitives for "a file". The File
>> object has already been shipping in browsers for a while, so it's not an API
>> that we expect to be able to make backwards incompatible changes to,
>> which somewhat limits the design of the proposed filesystem API.
>>
>> Only adding two new abstractions was very intentional. We wanted to keep
>> the API as small and simple as possible. So for example there is no
>> abstraction for "a filesystem". Instead we simply let the root directory
>> represent the filesystem.
>>
>> The API is entirely asynchronous since we don't expect implementations to
>> be able to keep the whole filesystem in memory, and we don't want to force
>> synchronous IO. But we've still tried to keep the API as friendly as possible.
>>
>> Detailed API:
>>
>> Apologies for using WebIDL here. I know it's not very popular with a lot of
>> people on this list. And it's especially unfortunate in this API since the use of
>> WebIDL to describe the API results in a lot of extra syntax in the description
>> which doesn't actually affect the javascript that developers would write.
>>
>> Unfortunately I don't know of any other formal way of describing the API
>> without spending tons of time typing up long descriptions of each function.
>>
>> partial interface Navigator {
>>   // This is what provides access to the sandboxed filesystem root.
>>   Promise<Directory> getFilesystem(optional FilesystemParameters
>> parameters); };
>>
>> interface Directory {
>>   readonly attribute DOMString name;
>>
>>   Promise<File> createFile(DOMString path,
>>                            CreateFileOptions options);
>>   Promise<Directory> createDirectory(DOMString path);
>>
>>   Promise<(File or Directory)> get(DOMString path);
>>
>>   AbortableProgressPromise<void>
>>     move((DOMString or File or Directory) path,
>>          (DOMString or Directory or DestinationDict) dest);
>>   AbortableProgressPromise<void>
>>     copy((DOMString or File or Directory) path,
>>          (DOMString or Directory or DestinationDict) dest);
>>   Promise<boolean> remove((DOMString or File or Directory) path);
>>   Promise<boolean> removeDeep((DOMString or File or Directory) path);
>>
>>   Promise<FileHandle> openRead((DOMString or File) path);
>>   Promise<FileHandleWritable> openWrite((DOMString or File) path,
>>         OpenWriteOptions options);
>>
>>   EventStream<(File or Directory)> enumerate(optional DOMString path);
>>   EventStream<File> enumerateDeep(optional DOMString path); };
>>
>> interface FileHandle
>> {
>>   readonly attribute FileOpenMode mode;
>>   readonly attribute boolean active;
>>
>>   attribute long long? offset;
>>
>>   Promise<File> getFile();
>>   AbortableProgressPromise<
>> ArrayBuffer> read(unsigned long long size);
>>   AbortableProgressPromise<DOMString> readText(unsigned long long size,
>> optional DOMString encoding = "utf-8");
>>
>>   void abort();
>> };
>>
>> interface FileHandleWritable : FileHandle {
>>   AbortableProgressPromise<void> write((DOMString or ArrayBuffer or
>> ArrayBufferView or Blob) value);
>>
>>   Promise<void> setSize(optional unsigned long long size);
>>
>>   Promise<void> flush();
>> };
>>
>> partial interface URL {
>>   static DOMString? getPersistentURL(File file); }
>>
>>
>> // WebIDL cruft that's largely transparent enum StorageType { "temporary",
>> "persistent" }; dictionary FilesystemParameters {
>>   StorageType storage = "temporary";
>> };
>>
>> dictionary CreateFileOptions {
>>   CreateIfExistsMode ifExists = "fail";
>>   (DOMString or Blob or ArrayBuffer or ArrayBufferView) data; };
>>
>> dictionary OpenWriteOptions {
>>   OpenIfNotExistsMode ifNotExists = "create";
>>   OpenIfExistsMode ifExists = "open";
>> };
>>
>> enum CreateIfExistsMode { "replace", "fail" }; enum OpenIfExistsMode {
>> "open", "fail" }; enum OpenIfNotExistsMode { "create", "fail" };
>>
>> dictionary DestinationDict {
>>   Directory dir;
>>   DOMString name;
>> };
>>
>> enum FileOpenMode { "readonly", "readwrite" };
>>
>> API Description:
>>
>> I won't go into the details about each function as it's hopefully mostly
>> obvious. A few general comments:
>>
>> The functions on Directory that accept DOMString arguments for filenames
>> allow names like "path/to/file.txt". If the function creates a file, then it
>> creates the intermediate directories. Such paths are always interpreted as
>> relative to the directory itself, never relative to the root.
>>
>> We were thinking of *not* allowing paths that walk up the directory tree. So
>> paths like "../foo", "..", "/foo/bar" or "foo/../bar" are not allowed. This to
>> keep things simple and avoid security issues for the page. Attempting to use
>> a path that contains a segment that is equal to ".." or ".", or any path which
>> starts with "/" will cause an error.
>> This way we can add support for this later if desired.
>>
>> Likewise, passing a File object to an operation of Directory where the File
>> object isn't contained in that directory or its descendents also results in an
>> error.
>>
>> One thing that is probably not obvious is how the FileHandle.location
>> attribute works. This attribute is used by the read/readText/write functions
>> to select where the read or write operation starts. When .read is called, it
>> uses the current value of .location to determine where the reading starts. It
>> then fires off an asynchronous read operation. It finally synchronously
>> increases .location by the amount of the 'size' argument before returning.
>> Same thing for .write() and .readText().
>>
>> This means that the caller can simply set .location and then fire off multiple
>> read or write operations which automatically will happen staggered in the
>> file. It also means that the caller can set the location for next operation by
>> simply setting .location, or can check the current location by simply getting
>> .location.
>>
>> Setting .offset to null means "go to the end". This is why there is no
>> openAppend function. Calling openWrite and then setting .offset to null
>> before writing results in an append.
>>
>> Note that getting or setting .offset does not need to synchronously call seek,
>> or do any IO operations, in the implementation. Instead the implementation
>> simply tracks .offset in the API implementation.
>> Whenever a read or write operation is scheduled, the current .offset is sent
>> along with the operation information to the IO thread and the seek can
>> happen there. Many times the implementation can optimize out the seek
>> entirely.
>>
>> The FileHandle class automatically closes itself as soon as the page stops
>> posting further calls to .read/.readBinary/.write to it. This happens once the
>> last Promise returned from one of those operations has been resolved,
>> without further calls to .read/.readBinary/.write having happened. This is
>> similar to IDB transactions, though obviously there are no transactional
>> semantics here. I.e. there is no way to roll back any changes.
>>
>> Open Questions:
>>
>> There are a few things that we did have disagreements on and which would
>> be worth debating.
>>
>> Is the setup around the FileHandle.offset attribute a good idea? Some
>> people found it confusingly different from posix.
>>
>> Can we get rid of the the non-recursive remove() function. The
>> removeRecusive() function has the same capabilities, except that
>> removeRecusive doesn't produce an error if you attempt to delete a non-
>> empty directory.
>>
>> Can we get rid of the copy() function? Copy operations are certainly common
>> to expose in UIs, but they can be easily implemented programmatically, so
>> having it in the API isn't strictly needed.
>>
>> Should we add an openAppend function which always appends for all writes.
>> Note that since FileHandle always holds an exclusive lock on the file, there is
>> no risk that other actors will append to the file as long as a FileHandle is being
>> used.
>>
>> Finally, should we remove the Directory abstraction? It's not needed given
>> that you can directly interact with files in subdirectories. But it does provide
>> the ability to do some capability management. I.e.
>> holding a Directory object enables you to interact with the files in that
>> directory and its subdirectories, but there is no way to reach out to a parent
>> directory. Directory objects also is a familiar concept in filesystem APIs, so it
>> seems natural to have it even though it's not strictly needed.
>>
>> [1] http://dev.w3.org/2006/webapi/FileAPI/
>>
>> / Jonas
>>
>
Received on Saturday, 10 August 2013 06:25:00 UTC