Re: Request for feedback: Filesystem API

below...


On Fri, Aug 9, 2013 at 6:15 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Fri, Aug 9, 2013 at 2:02 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> > Over the past few months a few of us at mozilla, with input from a lot
> > of other people, has been iterating on a filesystem API. The goal of
> > this filesystem API is first and foremost to expose a sandboxed
> > filesystem to webpages. This filesystem would be origin-specific and
> > would not allow accessing the user's OS filesystem. This avoids a lot
> > of the security concerns around filesystem APIs.
> >
> > However it is expected that this API will eventually also be used for
> > accessing real filesystems eventually, but there are a lot of security
> > concerns that needs to be solved before we can create a real standard
> > for that. Hence that is not the topic of this email.
> >
> > API summary:
> >
> > The proposed API introduces two new abstractions: A Directory object
> > which allows manipulating files and directories within it, and a
> > FileHandle object which allows holding an exclusive lock on a file
> > while performing multiple read/write operations on it.
> >
> > The API intentionally reuses the already existing File abstraction as
> > defined by [1] as we didn't want to have two different primitives for
> > "a file". The File object has already been shipping in browsers for a
> > while, so it's not an API that we expect to be able to make backwards
> > incompatible changes to, which somewhat limits the design of the
> > proposed filesystem API.
> >
> > Only adding two new abstractions was very intentional. We wanted to
> > keep the API as small and simple as possible. So for example there is
> > no abstraction for "a filesystem". Instead we simply let the root
> > directory represent the filesystem.
> >
> > The API is entirely asynchronous since we don't expect implementations
> > to be able to keep the whole filesystem in memory, and we don't want
> > to force synchronous IO. But we've still tried to keep the API as
> > friendly as possible.
> >
> > Detailed API:
> >
> > Apologies for using WebIDL here. I know it's not very popular with a
> > lot of people on this list. And it's especially unfortunate in this
> > API since the use of WebIDL to describe the API results in a lot of
> > extra syntax in the description which doesn't actually affect the
> > javascript that developers would write.
> >
> > Unfortunately I don't know of any other formal way of describing the
> > API without spending tons of time typing up long descriptions of each
> > function.
> >
> > partial interface Navigator {
> >   // This is what provides access to the sandboxed filesystem root.
> >   Promise<Directory> getFilesystem(optional FilesystemParameters
> parameters);
> > };
> >
> > interface Directory {
> >   readonly attribute DOMString name;
> >
> >   Promise<File> createFile(DOMString path,
> >                            CreateFileOptions options);
> >   Promise<Directory> createDirectory(DOMString path);
> >
> >   Promise<(File or Directory)> get(DOMString path);
> >
> >   AbortableProgressPromise<void>
> >     move((DOMString or File or Directory) path,
> >          (DOMString or Directory or DestinationDict) dest);
> >   AbortableProgressPromise<void>
> >     copy((DOMString or File or Directory) path,
> >          (DOMString or Directory or DestinationDict) dest);
> >   Promise<boolean> remove((DOMString or File or Directory) path);
> >   Promise<boolean> removeDeep((DOMString or File or Directory) path);
> >
> >   Promise<FileHandle> openRead((DOMString or File) path);
> >   Promise<FileHandleWritable> openWrite((DOMString or File) path,
> >         OpenWriteOptions options);
> >
> >   EventStream<(File or Directory)> enumerate(optional DOMString path);
> >   EventStream<File> enumerateDeep(optional DOMString path);
> > };
> >
> > interface FileHandle
> > {
> >   readonly attribute FileOpenMode mode;
> >   readonly attribute boolean active;
> >
> >   attribute long long? offset;
> >
> >   Promise<File> getFile();
> >   AbortableProgressPromise<
> > ArrayBuffer> read(unsigned long long size);
> >   AbortableProgressPromise<DOMString> readText(unsigned long long
> > size, optional DOMString encoding = "utf-8");
> >
> >   void abort();
> > };
> >
> > interface FileHandleWritable : FileHandle
> > {
> >   AbortableProgressPromise<void> write((DOMString or ArrayBuffer or
> > ArrayBufferView or Blob) value);
> >
> >   Promise<void> setSize(optional unsigned long long size);
> >
> >   Promise<void> flush();
> > };
> >
> > partial interface URL {
> >   static DOMString? getPersistentURL(File file);
> > }
> >
> >
> > // WebIDL cruft that's largely transparent
> > enum StorageType { "temporary", "persistent" };
> > dictionary FilesystemParameters {
> >   StorageType storage = "temporary";
> > };
> >
> > dictionary CreateFileOptions {
> >   CreateIfExistsMode ifExists = "fail";
> >   (DOMString or Blob or ArrayBuffer or ArrayBufferView) data;
> > };
> >
> > dictionary OpenWriteOptions {
> >   OpenIfNotExistsMode ifNotExists = "create";
> >   OpenIfExistsMode ifExists = "open";
> > };
> >
> > enum CreateIfExistsMode { "replace", "fail" };
> > enum OpenIfExistsMode { "open", "fail" };
> > enum OpenIfNotExistsMode { "create", "fail" };
> >
> > dictionary DestinationDict {
> >   Directory dir;
> >   DOMString name;
> > };
> >
> > enum FileOpenMode { "readonly", "readwrite" };
> >
> > API Description:
> >
> > I won't go into the details about each function as it's hopefully
> > mostly obvious. A few general comments:
> >
> > The functions on Directory that accept DOMString arguments for
> > filenames allow names like "path/to/file.txt". If the function creates a
> > file, then it creates the intermediate directories. Such paths are
> > always interpreted as relative to the directory itself, never relative
> > to the root.
> >
> > We were thinking of *not* allowing paths that walk up the directory
> > tree. So paths like "../foo", "..", "/foo/bar" or "foo/../bar" are not
> > allowed. This to keep things simple and avoid security issues for the
> > page. Attempting to use a path that contains a segment that is equal
> > to ".." or ".", or any path which starts with "/" will cause an error.
> > This way we can add support for this later if desired.
> >
> > Likewise, passing a File object to an operation of Directory where the
> > File object isn't contained in that directory or its descendents also
> > results in an error.
> >
> > One thing that is probably not obvious is how the FileHandle.location
> > attribute works. This attribute is used by the read/readText/write
> > functions to select where the read or write operation starts. When
> > .read is called, it uses the current value of .location to determine
> > where the reading starts. It then fires off an asynchronous read
> > operation. It finally synchronously increases .location by the amount
> > of the 'size' argument before returning. Same thing for .write() and
> > .readText().
> >
> > This means that the caller can simply set .location and then fire off
> > multiple read or write operations which automatically will happen
> > staggered in the file. It also means that the caller can set the
> > location for next operation by simply setting .location, or can check
> > the current location by simply getting .location.
> >
> > Setting .offset to null means "go to the end". This is why there is no
> > openAppend function. Calling openWrite and then setting .offset to
> > null before writing results in an append.
> >
> > Note that getting or setting .offset does not need to synchronously
> > call seek, or do any IO operations, in the implementation. Instead the
> > implementation simply tracks .offset in the API implementation.
> > Whenever a read or write operation is scheduled, the current .offset
> > is sent along with the operation information to the IO thread and the
> > seek can happen there. Many times the implementation can optimize out
> > the seek entirely.
> >
> > The FileHandle class automatically closes itself as soon as the page
> > stops posting further calls to .read/.readBinary/.write to it. This
> > happens once the last Promise returned from one of those operations
> > has been resolved, without further calls to .read/.readBinary/.write
> > having happened. This is similar to IDB transactions, though obviously
> > there are no transactional semantics here. I.e. there is no way to
> > roll back any changes.
> >
> > Open Questions:
> >
> > There are a few things that we did have disagreements on and which
> > would be worth debating.
> >
> > Is the setup around the FileHandle.offset attribute a good idea? Some
> > people found it confusingly different from posix.
> >
> > Can we get rid of the the non-recursive remove() function. The
> > removeRecusive() function has the same capabilities, except that
> > removeRecusive doesn't produce an error if you attempt to delete a
> > non-empty directory.
> >
> > Can we get rid of the copy() function? Copy operations are certainly
> > common to expose in UIs, but they can be easily implemented
> > programmatically, so having it in the API isn't strictly needed.
> >
> > Should we add an openAppend function which always appends for all
> > writes. Note that since FileHandle always holds an exclusive lock on
> > the file, there is no risk that other actors will append to the file
> > as long as a FileHandle is being used.
> >
> > Finally, should we remove the Directory abstraction? It's not needed
> > given that you can directly interact with files in subdirectories. But
> > it does provide the ability to do some capability management. I.e.
> > holding a Directory object enables you to interact with the files in
> > that directory and its subdirectories, but there is no way to reach
> > out to a parent directory. Directory objects also is a familiar
> > concept in filesystem APIs, so it seems natural to have it even though
> > it's not strictly needed.
> >
> > [1] http://dev.w3.org/2006/webapi/FileAPI/
>
> After all that, of course I forgot to include examples of what the API
> looks like when used.
>
> // Save some downloaded data into a new file:
> navigator.getFilesystem().then(function(root) {
>   root.createFile("myfile.txt", { data: xhr.response });
> });
>
> // Append 5 bytes to the end of a large existing file:
> navigator.getFilesystem().then(function(root) {
>   return root.openWrite("largefile.dat");
> }).then(function(handle) {
>   handle.offset = null;
>   return handle.write(new Uint8Array([1, 1, 2, 3, 5]));
> });
>
> // Increase the 100th byte in large existing file:
> var fileHandle;
> navigator.getFilesystem().then(function(root) {
>   return root.openWrite("dir/highscores");
> }).then(function(handle) {
>   fileHandle = handle;
>   fileHandle.offset = 100;
>   return fileHandle.read(1);
> }).then(function(buffer) {
>   assert(buffer.byteLength === 1);
>   var view = new Uint8Array(buffer);
>   view[0]++;
>   fileHandle.location--;
>   return handle.write(buffer);
> });
>
> / Jonas
>
>


I didn't see any rationale that explains the decision to hang this off the
navigator object, unless there is a definite reason, then perhaps this
should be it's own [[Global]] object? I apologize for sounding like a
broken record, but the "navigator" has nothing to do with the File System.

partial interface Window {
  static FileSystem;
}

interface FileSystem {
  Promise<Directory> get(optional FilesystemParameters parameters);
}

Everything else could stay as-is. The examples then look like:

(I used "get" in the IDL, but will use "request" in the examples, because
I'm using my imagination and I think it looks nice)


// Save some downloaded data into a new file:
FileSystem.request().then(function(root) {
  root.createFile("myfile.txt", { data: xhr.response });
});

// Append 5 bytes to the end of a large existing file:
FileSystem.request().then(function(root) {
  return root.openWrite("largefile.dat");
}).then(function(handle) {
  handle.offset = null;
  return handle.write(new Uint8Array([1, 1, 2, 3, 5]));
});

// Increase the 100th byte in large existing file:
var fileHandle;
FileSystem.request().then(function(root) {
  return root.openWrite("dir/highscores");
}).then(function(handle) {
  fileHandle = handle;
  fileHandle.offset = 100;
  return fileHandle.read(1);
}).then(function(buffer) {
  assert(buffer.byteLength === 1);
  var view = new Uint8Array(buffer);
  view[0]++;
  fileHandle.location--;
  return handle.write(buffer);
});


Or better yet, FileSystem is a constructor that produces FileSystem objects
that have a "get" or "request" method, now it's a reusable object...


var fs = new FileSystem();

// Save some downloaded data into a new file:
fs.request().then(function(root) {
  root.createFile("myfile.txt", { data: xhr.response });
});

// Append 5 bytes to the end of a large existing file:
fs.request().then(function(root) {
  return root.openWrite("largefile.dat");
}).then(function(handle) {
  handle.offset = null;
  return handle.write(new Uint8Array([1, 1, 2, 3, 5]));
});

// Increase the 100th byte in large existing file:
var fileHandle;
fs.request().then(function(root) {
  return root.openWrite("dir/highscores");
}).then(function(handle) {
  fileHandle = handle;
  fileHandle.offset = 100;
  return fileHandle.read(1);
}).then(function(buffer) {
  assert(buffer.byteLength === 1);
  var view = new Uint8Array(buffer);
  view[0]++;
  fileHandle.location--;
  return handle.write(buffer);
});


I don't know how set in stone the naming is, but you might also consider
reviewing some prior art (http://nodejs.org/api/fs.html) for method names
call signatures.


Rick

Received on Friday, 9 August 2013 23:03:24 UTC