Re: Request for feedback: Filesystem API

On Fri, Aug 9, 2013 at 4:02 PM, Rick Waldron <waldron.rick@gmail.com> wrote:
> below...
>
>
> On Fri, Aug 9, 2013 at 6:15 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Fri, Aug 9, 2013 at 2:02 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> > Over the past few months a few of us at mozilla, with input from a lot
>> > of other people, has been iterating on a filesystem API. The goal of
>> > this filesystem API is first and foremost to expose a sandboxed
>> > filesystem to webpages. This filesystem would be origin-specific and
>> > would not allow accessing the user's OS filesystem. This avoids a lot
>> > of the security concerns around filesystem APIs.
>> >
>> > However it is expected that this API will eventually also be used for
>> > accessing real filesystems eventually, but there are a lot of security
>> > concerns that needs to be solved before we can create a real standard
>> > for that. Hence that is not the topic of this email.
>> >
>> > API summary:
>> >
>> > The proposed API introduces two new abstractions: A Directory object
>> > which allows manipulating files and directories within it, and a
>> > FileHandle object which allows holding an exclusive lock on a file
>> > while performing multiple read/write operations on it.
>> >
>> > The API intentionally reuses the already existing File abstraction as
>> > defined by [1] as we didn't want to have two different primitives for
>> > "a file". The File object has already been shipping in browsers for a
>> > while, so it's not an API that we expect to be able to make backwards
>> > incompatible changes to, which somewhat limits the design of the
>> > proposed filesystem API.
>> >
>> > Only adding two new abstractions was very intentional. We wanted to
>> > keep the API as small and simple as possible. So for example there is
>> > no abstraction for "a filesystem". Instead we simply let the root
>> > directory represent the filesystem.
>> >
>> > The API is entirely asynchronous since we don't expect implementations
>> > to be able to keep the whole filesystem in memory, and we don't want
>> > to force synchronous IO. But we've still tried to keep the API as
>> > friendly as possible.
>> >
>> > Detailed API:
>> >
>> > Apologies for using WebIDL here. I know it's not very popular with a
>> > lot of people on this list. And it's especially unfortunate in this
>> > API since the use of WebIDL to describe the API results in a lot of
>> > extra syntax in the description which doesn't actually affect the
>> > javascript that developers would write.
>> >
>> > Unfortunately I don't know of any other formal way of describing the
>> > API without spending tons of time typing up long descriptions of each
>> > function.
>> >
>> > partial interface Navigator {
>> >   // This is what provides access to the sandboxed filesystem root.
>> >   Promise<Directory> getFilesystem(optional FilesystemParameters
>> > parameters);
>> > };
>> >
>> > interface Directory {
>> >   readonly attribute DOMString name;
>> >
>> >   Promise<File> createFile(DOMString path,
>> >                            CreateFileOptions options);
>> >   Promise<Directory> createDirectory(DOMString path);
>> >
>> >   Promise<(File or Directory)> get(DOMString path);
>> >
>> >   AbortableProgressPromise<void>
>> >     move((DOMString or File or Directory) path,
>> >          (DOMString or Directory or DestinationDict) dest);
>> >   AbortableProgressPromise<void>
>> >     copy((DOMString or File or Directory) path,
>> >          (DOMString or Directory or DestinationDict) dest);
>> >   Promise<boolean> remove((DOMString or File or Directory) path);
>> >   Promise<boolean> removeDeep((DOMString or File or Directory) path);
>> >
>> >   Promise<FileHandle> openRead((DOMString or File) path);
>> >   Promise<FileHandleWritable> openWrite((DOMString or File) path,
>> >         OpenWriteOptions options);
>> >
>> >   EventStream<(File or Directory)> enumerate(optional DOMString path);
>> >   EventStream<File> enumerateDeep(optional DOMString path);
>> > };
>> >
>> > interface FileHandle
>> > {
>> >   readonly attribute FileOpenMode mode;
>> >   readonly attribute boolean active;
>> >
>> >   attribute long long? offset;
>> >
>> >   Promise<File> getFile();
>> >   AbortableProgressPromise<
>> > ArrayBuffer> read(unsigned long long size);
>> >   AbortableProgressPromise<DOMString> readText(unsigned long long
>> > size, optional DOMString encoding = "utf-8");
>> >
>> >   void abort();
>> > };
>> >
>> > interface FileHandleWritable : FileHandle
>> > {
>> >   AbortableProgressPromise<void> write((DOMString or ArrayBuffer or
>> > ArrayBufferView or Blob) value);
>> >
>> >   Promise<void> setSize(optional unsigned long long size);
>> >
>> >   Promise<void> flush();
>> > };
>> >
>> > partial interface URL {
>> >   static DOMString? getPersistentURL(File file);
>> > }
>> >
>> >
>> > // WebIDL cruft that's largely transparent
>> > enum StorageType { "temporary", "persistent" };
>> > dictionary FilesystemParameters {
>> >   StorageType storage = "temporary";
>> > };
>> >
>> > dictionary CreateFileOptions {
>> >   CreateIfExistsMode ifExists = "fail";
>> >   (DOMString or Blob or ArrayBuffer or ArrayBufferView) data;
>> > };
>> >
>> > dictionary OpenWriteOptions {
>> >   OpenIfNotExistsMode ifNotExists = "create";
>> >   OpenIfExistsMode ifExists = "open";
>> > };
>> >
>> > enum CreateIfExistsMode { "replace", "fail" };
>> > enum OpenIfExistsMode { "open", "fail" };
>> > enum OpenIfNotExistsMode { "create", "fail" };
>> >
>> > dictionary DestinationDict {
>> >   Directory dir;
>> >   DOMString name;
>> > };
>> >
>> > enum FileOpenMode { "readonly", "readwrite" };
>> >
>> > API Description:
>> >
>> > I won't go into the details about each function as it's hopefully
>> > mostly obvious. A few general comments:
>> >
>> > The functions on Directory that accept DOMString arguments for
>> > filenames allow names like "path/to/file.txt". If the function creates a
>> > file, then it creates the intermediate directories. Such paths are
>> > always interpreted as relative to the directory itself, never relative
>> > to the root.
>> >
>> > We were thinking of *not* allowing paths that walk up the directory
>> > tree. So paths like "../foo", "..", "/foo/bar" or "foo/../bar" are not
>> > allowed. This to keep things simple and avoid security issues for the
>> > page. Attempting to use a path that contains a segment that is equal
>> > to ".." or ".", or any path which starts with "/" will cause an error.
>> > This way we can add support for this later if desired.
>> >
>> > Likewise, passing a File object to an operation of Directory where the
>> > File object isn't contained in that directory or its descendents also
>> > results in an error.
>> >
>> > One thing that is probably not obvious is how the FileHandle.location
>> > attribute works. This attribute is used by the read/readText/write
>> > functions to select where the read or write operation starts. When
>> > .read is called, it uses the current value of .location to determine
>> > where the reading starts. It then fires off an asynchronous read
>> > operation. It finally synchronously increases .location by the amount
>> > of the 'size' argument before returning. Same thing for .write() and
>> > .readText().
>> >
>> > This means that the caller can simply set .location and then fire off
>> > multiple read or write operations which automatically will happen
>> > staggered in the file. It also means that the caller can set the
>> > location for next operation by simply setting .location, or can check
>> > the current location by simply getting .location.
>> >
>> > Setting .offset to null means "go to the end". This is why there is no
>> > openAppend function. Calling openWrite and then setting .offset to
>> > null before writing results in an append.
>> >
>> > Note that getting or setting .offset does not need to synchronously
>> > call seek, or do any IO operations, in the implementation. Instead the
>> > implementation simply tracks .offset in the API implementation.
>> > Whenever a read or write operation is scheduled, the current .offset
>> > is sent along with the operation information to the IO thread and the
>> > seek can happen there. Many times the implementation can optimize out
>> > the seek entirely.
>> >
>> > The FileHandle class automatically closes itself as soon as the page
>> > stops posting further calls to .read/.readBinary/.write to it. This
>> > happens once the last Promise returned from one of those operations
>> > has been resolved, without further calls to .read/.readBinary/.write
>> > having happened. This is similar to IDB transactions, though obviously
>> > there are no transactional semantics here. I.e. there is no way to
>> > roll back any changes.
>> >
>> > Open Questions:
>> >
>> > There are a few things that we did have disagreements on and which
>> > would be worth debating.
>> >
>> > Is the setup around the FileHandle.offset attribute a good idea? Some
>> > people found it confusingly different from posix.
>> >
>> > Can we get rid of the the non-recursive remove() function. The
>> > removeRecusive() function has the same capabilities, except that
>> > removeRecusive doesn't produce an error if you attempt to delete a
>> > non-empty directory.
>> >
>> > Can we get rid of the copy() function? Copy operations are certainly
>> > common to expose in UIs, but they can be easily implemented
>> > programmatically, so having it in the API isn't strictly needed.
>> >
>> > Should we add an openAppend function which always appends for all
>> > writes. Note that since FileHandle always holds an exclusive lock on
>> > the file, there is no risk that other actors will append to the file
>> > as long as a FileHandle is being used.
>> >
>> > Finally, should we remove the Directory abstraction? It's not needed
>> > given that you can directly interact with files in subdirectories. But
>> > it does provide the ability to do some capability management. I.e.
>> > holding a Directory object enables you to interact with the files in
>> > that directory and its subdirectories, but there is no way to reach
>> > out to a parent directory. Directory objects also is a familiar
>> > concept in filesystem APIs, so it seems natural to have it even though
>> > it's not strictly needed.
>> >
>> > [1] http://dev.w3.org/2006/webapi/FileAPI/
>>
>> After all that, of course I forgot to include examples of what the API
>> looks like when used.
>>
>> // Save some downloaded data into a new file:
>> navigator.getFilesystem().then(function(root) {
>>   root.createFile("myfile.txt", { data: xhr.response });
>> });
>>
>> // Append 5 bytes to the end of a large existing file:
>> navigator.getFilesystem().then(function(root) {
>>   return root.openWrite("largefile.dat");
>> }).then(function(handle) {
>>   handle.offset = null;
>>   return handle.write(new Uint8Array([1, 1, 2, 3, 5]));
>> });
>>
>> // Increase the 100th byte in large existing file:
>> var fileHandle;
>> navigator.getFilesystem().then(function(root) {
>>   return root.openWrite("dir/highscores");
>> }).then(function(handle) {
>>   fileHandle = handle;
>>   fileHandle.offset = 100;
>>   return fileHandle.read(1);
>> }).then(function(buffer) {
>>   assert(buffer.byteLength === 1);
>>   var view = new Uint8Array(buffer);
>>   view[0]++;
>>   fileHandle.location--;
>>   return handle.write(buffer);
>> });
>>
>> / Jonas
>>
>
>
>
> I didn't see any rationale that explains the decision to hang this off the
> navigator object, unless there is a definite reason, then perhaps this
> should be it's own [[Global]] object? I apologize for sounding like a broken
> record, but the "navigator" has nothing to do with the File System.
>
> partial interface Window {
>   static FileSystem;
> }
>
> interface FileSystem {
>   Promise<Directory> get(optional FilesystemParameters parameters);
> }
>
> Everything else could stay as-is. The examples then look like:
>
> (I used "get" in the IDL, but will use "request" in the examples, because
> I'm using my imagination and I think it looks nice)

o_O

Your colorful imagination makes me not understand if you are proposing
that the function should be called "get", or "request". Or if you are
proposing that either should work. Or if you are proposing that there
is some other magic going on.

Put another way, what does the last "it" refer to above?

Other than that I don't feel strongly. I'm not terribly excited to
introduce a FileSystem interface, even if it is one that only has a
single static function. But if people generally feel that that is
better then I can live with that.

The reason we tend to hang things off of Navigator these days is that
adding things to the global scope always runs the risk of name
collisions with existing content.

> Or better yet, FileSystem is a constructor that produces FileSystem objects
> that have a "get" or "request" method, now it's a reusable object...
>
>
> var fs = new FileSystem();
>
> // Save some downloaded data into a new file:
> fs.request().then(function(root) {
>   root.createFile("myfile.txt", { data: xhr.response });
> });

(new Filesystem()).request() looks less nice to me than
Filesystem.request(). But again, I can live with either.

> I don't know how set in stone the naming is, but you might also consider
> reviewing some prior art (http://nodejs.org/api/fs.html) for method names
> call signatures.

Nothing is set in stone at this point.

/ Jonas

Received on Friday, 9 August 2013 23:23:42 UTC