Re: Request for feedback: Filesystem API from Rick Waldron on 2013-08-09 (public-script-coord@w3.org from July to September 2013)

From: Rick Waldron <waldron.rick@gmail.com>
Date: Fri, 9 Aug 2013 19:34:57 -0400
To: Jonas Sicking <jonas@sicking.cc>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CAHfnhfqNNKwaQJ7yg9b34Y1Tn_gG+48VSLG8-B9JZN-zR3D-pA@mail.gmail.com>
On Fri, Aug 9, 2013 at 7:22 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Fri, Aug 9, 2013 at 4:02 PM, Rick Waldron <waldron.rick@gmail.com>
> wrote:
> > below...
> >
> >
> > On Fri, Aug 9, 2013 at 6:15 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> >>
> >> On Fri, Aug 9, 2013 at 2:02 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> >> > Over the past few months a few of us at mozilla, with input from a lot
> >> > of other people, has been iterating on a filesystem API. The goal of
> >> > this filesystem API is first and foremost to expose a sandboxed
> >> > filesystem to webpages. This filesystem would be origin-specific and
> >> > would not allow accessing the user's OS filesystem. This avoids a lot
> >> > of the security concerns around filesystem APIs.
> >> >
> >> > However it is expected that this API will eventually also be used for
> >> > accessing real filesystems eventually, but there are a lot of security
> >> > concerns that needs to be solved before we can create a real standard
> >> > for that. Hence that is not the topic of this email.
> >> >
> >> > API summary:
> >> >
> >> > The proposed API introduces two new abstractions: A Directory object
> >> > which allows manipulating files and directories within it, and a
> >> > FileHandle object which allows holding an exclusive lock on a file
> >> > while performing multiple read/write operations on it.
> >> >
> >> > The API intentionally reuses the already existing File abstraction as
> >> > defined by [1] as we didn't want to have two different primitives for
> >> > "a file". The File object has already been shipping in browsers for a
> >> > while, so it's not an API that we expect to be able to make backwards
> >> > incompatible changes to, which somewhat limits the design of the
> >> > proposed filesystem API.
> >> >
> >> > Only adding two new abstractions was very intentional. We wanted to
> >> > keep the API as small and simple as possible. So for example there is
> >> > no abstraction for "a filesystem". Instead we simply let the root
> >> > directory represent the filesystem.
> >> >
> >> > The API is entirely asynchronous since we don't expect implementations
> >> > to be able to keep the whole filesystem in memory, and we don't want
> >> > to force synchronous IO. But we've still tried to keep the API as
> >> > friendly as possible.
> >> >
> >> > Detailed API:
> >> >
> >> > Apologies for using WebIDL here. I know it's not very popular with a
> >> > lot of people on this list. And it's especially unfortunate in this
> >> > API since the use of WebIDL to describe the API results in a lot of
> >> > extra syntax in the description which doesn't actually affect the
> >> > javascript that developers would write.
> >> >
> >> > Unfortunately I don't know of any other formal way of describing the
> >> > API without spending tons of time typing up long descriptions of each
> >> > function.
> >> >
> >> > partial interface Navigator {
> >> >   // This is what provides access to the sandboxed filesystem root.
> >> >   Promise<Directory> getFilesystem(optional FilesystemParameters
> >> > parameters);
> >> > };
> >> >
> >> > interface Directory {
> >> >   readonly attribute DOMString name;
> >> >
> >> >   Promise<File> createFile(DOMString path,
> >> >                            CreateFileOptions options);
> >> >   Promise<Directory> createDirectory(DOMString path);
> >> >
> >> >   Promise<(File or Directory)> get(DOMString path);
> >> >
> >> >   AbortableProgressPromise<void>
> >> >     move((DOMString or File or Directory) path,
> >> >          (DOMString or Directory or DestinationDict) dest);
> >> >   AbortableProgressPromise<void>
> >> >     copy((DOMString or File or Directory) path,
> >> >          (DOMString or Directory or DestinationDict) dest);
> >> >   Promise<boolean> remove((DOMString or File or Directory) path);
> >> >   Promise<boolean> removeDeep((DOMString or File or Directory) path);
> >> >
> >> >   Promise<FileHandle> openRead((DOMString or File) path);
> >> >   Promise<FileHandleWritable> openWrite((DOMString or File) path,
> >> >         OpenWriteOptions options);
> >> >
> >> >   EventStream<(File or Directory)> enumerate(optional DOMString path);
> >> >   EventStream<File> enumerateDeep(optional DOMString path);
> >> > };
> >> >
> >> > interface FileHandle
> >> > {
> >> >   readonly attribute FileOpenMode mode;
> >> >   readonly attribute boolean active;
> >> >
> >> >   attribute long long? offset;
> >> >
> >> >   Promise<File> getFile();
> >> >   AbortableProgressPromise<
> >> > ArrayBuffer> read(unsigned long long size);
> >> >   AbortableProgressPromise<DOMString> readText(unsigned long long
> >> > size, optional DOMString encoding = "utf-8");
> >> >
> >> >   void abort();
> >> > };
> >> >
> >> > interface FileHandleWritable : FileHandle
> >> > {
> >> >   AbortableProgressPromise<void> write((DOMString or ArrayBuffer or
> >> > ArrayBufferView or Blob) value);
> >> >
> >> >   Promise<void> setSize(optional unsigned long long size);
> >> >
> >> >   Promise<void> flush();
> >> > };
> >> >
> >> > partial interface URL {
> >> >   static DOMString? getPersistentURL(File file);
> >> > }
> >> >
> >> >
> >> > // WebIDL cruft that's largely transparent
> >> > enum StorageType { "temporary", "persistent" };
> >> > dictionary FilesystemParameters {
> >> >   StorageType storage = "temporary";
> >> > };
> >> >
> >> > dictionary CreateFileOptions {
> >> >   CreateIfExistsMode ifExists = "fail";
> >> >   (DOMString or Blob or ArrayBuffer or ArrayBufferView) data;
> >> > };
> >> >
> >> > dictionary OpenWriteOptions {
> >> >   OpenIfNotExistsMode ifNotExists = "create";
> >> >   OpenIfExistsMode ifExists = "open";
> >> > };
> >> >
> >> > enum CreateIfExistsMode { "replace", "fail" };
> >> > enum OpenIfExistsMode { "open", "fail" };
> >> > enum OpenIfNotExistsMode { "create", "fail" };
> >> >
> >> > dictionary DestinationDict {
> >> >   Directory dir;
> >> >   DOMString name;
> >> > };
> >> >
> >> > enum FileOpenMode { "readonly", "readwrite" };
> >> >
> >> > API Description:
> >> >
> >> > I won't go into the details about each function as it's hopefully
> >> > mostly obvious. A few general comments:
> >> >
> >> > The functions on Directory that accept DOMString arguments for
> >> > filenames allow names like "path/to/file.txt". If the function
> creates a
> >> > file, then it creates the intermediate directories. Such paths are
> >> > always interpreted as relative to the directory itself, never relative
> >> > to the root.
> >> >
> >> > We were thinking of *not* allowing paths that walk up the directory
> >> > tree. So paths like "../foo", "..", "/foo/bar" or "foo/../bar" are not
> >> > allowed. This to keep things simple and avoid security issues for the
> >> > page. Attempting to use a path that contains a segment that is equal
> >> > to ".." or ".", or any path which starts with "/" will cause an error.
> >> > This way we can add support for this later if desired.
> >> >
> >> > Likewise, passing a File object to an operation of Directory where the
> >> > File object isn't contained in that directory or its descendents also
> >> > results in an error.
> >> >
> >> > One thing that is probably not obvious is how the FileHandle.location
> >> > attribute works. This attribute is used by the read/readText/write
> >> > functions to select where the read or write operation starts. When
> >> > .read is called, it uses the current value of .location to determine
> >> > where the reading starts. It then fires off an asynchronous read
> >> > operation. It finally synchronously increases .location by the amount
> >> > of the 'size' argument before returning. Same thing for .write() and
> >> > .readText().
> >> >
> >> > This means that the caller can simply set .location and then fire off
> >> > multiple read or write operations which automatically will happen
> >> > staggered in the file. It also means that the caller can set the
> >> > location for next operation by simply setting .location, or can check
> >> > the current location by simply getting .location.
> >> >
> >> > Setting .offset to null means "go to the end". This is why there is no
> >> > openAppend function. Calling openWrite and then setting .offset to
> >> > null before writing results in an append.
> >> >
> >> > Note that getting or setting .offset does not need to synchronously
> >> > call seek, or do any IO operations, in the implementation. Instead the
> >> > implementation simply tracks .offset in the API implementation.
> >> > Whenever a read or write operation is scheduled, the current .offset
> >> > is sent along with the operation information to the IO thread and the
> >> > seek can happen there. Many times the implementation can optimize out
> >> > the seek entirely.
> >> >
> >> > The FileHandle class automatically closes itself as soon as the page
> >> > stops posting further calls to .read/.readBinary/.write to it. This
> >> > happens once the last Promise returned from one of those operations
> >> > has been resolved, without further calls to .read/.readBinary/.write
> >> > having happened. This is similar to IDB transactions, though obviously
> >> > there are no transactional semantics here. I.e. there is no way to
> >> > roll back any changes.
> >> >
> >> > Open Questions:
> >> >
> >> > There are a few things that we did have disagreements on and which
> >> > would be worth debating.
> >> >
> >> > Is the setup around the FileHandle.offset attribute a good idea? Some
> >> > people found it confusingly different from posix.
> >> >
> >> > Can we get rid of the the non-recursive remove() function. The
> >> > removeRecusive() function has the same capabilities, except that
> >> > removeRecusive doesn't produce an error if you attempt to delete a
> >> > non-empty directory.
> >> >
> >> > Can we get rid of the copy() function? Copy operations are certainly
> >> > common to expose in UIs, but they can be easily implemented
> >> > programmatically, so having it in the API isn't strictly needed.
> >> >
> >> > Should we add an openAppend function which always appends for all
> >> > writes. Note that since FileHandle always holds an exclusive lock on
> >> > the file, there is no risk that other actors will append to the file
> >> > as long as a FileHandle is being used.
> >> >
> >> > Finally, should we remove the Directory abstraction? It's not needed
> >> > given that you can directly interact with files in subdirectories. But
> >> > it does provide the ability to do some capability management. I.e.
> >> > holding a Directory object enables you to interact with the files in
> >> > that directory and its subdirectories, but there is no way to reach
> >> > out to a parent directory. Directory objects also is a familiar
> >> > concept in filesystem APIs, so it seems natural to have it even though
> >> > it's not strictly needed.
> >> >
> >> > [1] http://dev.w3.org/2006/webapi/FileAPI/
> >>
> >> After all that, of course I forgot to include examples of what the API
> >> looks like when used.
> >>
> >> // Save some downloaded data into a new file:
> >> navigator.getFilesystem().then(function(root) {
> >>   root.createFile("myfile.txt", { data: xhr.response });
> >> });
> >>
> >> // Append 5 bytes to the end of a large existing file:
> >> navigator.getFilesystem().then(function(root) {
> >>   return root.openWrite("largefile.dat");
> >> }).then(function(handle) {
> >>   handle.offset = null;
> >>   return handle.write(new Uint8Array([1, 1, 2, 3, 5]));
> >> });
> >>
> >> // Increase the 100th byte in large existing file:
> >> var fileHandle;
> >> navigator.getFilesystem().then(function(root) {
> >>   return root.openWrite("dir/highscores");
> >> }).then(function(handle) {
> >>   fileHandle = handle;
> >>   fileHandle.offset = 100;
> >>   return fileHandle.read(1);
> >> }).then(function(buffer) {
> >>   assert(buffer.byteLength === 1);
> >>   var view = new Uint8Array(buffer);
> >>   view[0]++;
> >>   fileHandle.location--;
> >>   return handle.write(buffer);
> >> });
> >>
> >> / Jonas
> >>
> >
> >
> >
> > I didn't see any rationale that explains the decision to hang this off
> the
> > navigator object, unless there is a definite reason, then perhaps this
> > should be it's own [[Global]] object? I apologize for sounding like a
> broken
> > record, but the "navigator" has nothing to do with the File System.
> >
> > partial interface Window {
> >   static FileSystem;
> > }
> >
> > interface FileSystem {
> >   Promise<Directory> get(optional FilesystemParameters parameters);
> > }
> >
> > Everything else could stay as-is. The examples then look like:
> >
> > (I used "get" in the IDL, but will use "request" in the examples, because
> > I'm using my imagination and I think it looks nice)
>
>
Jonas, thanks for the quick response


>  o_O
>
:)


>
> Your colorful imagination makes me not understand if you are proposing
> that the function should be called "get", or "request". Or if you are
> proposing that either should work. Or if you are proposing that there
> is some other magic going on.
>

Nope, no magic. I was only trying to share "out loud" my desire to feel out
which expresses the program intent more clearly. Apologies that I made it
confusing :)


>
> Put another way, what does the last "it" refer to above?
>

I meant that, subjectively, I think the word "request" nicely expresses the
intention: I want to request the filesystem and then do something with it.
The problem I had, while writing that response, is that I kept thinking of
"get" as a synchronous operation, for example a Map object, eg. var value =
map.get(key); value; // 42. Of course this isn't a synchronous operation
that's being designed, so I wonder if "get" is inappropriately being
co-opted from a synchronous world into an asynchronous world. I'll also
gladly accept that I'm over thinking it.



>
> Other than that I don't feel strongly. I'm not terribly excited to
> introduce a FileSystem interface, even if it is one that only has a
> single static function. But if people generally feel that that is
> better then I can live with that.
>

Well, it was really meant as a stepping stone to the last example ;)


>
> The reason we tend to hang things off of Navigator these days is that
> adding things to the global scope always runs the risk of name
> collisions with existing content.
>
> > Or better yet, FileSystem is a constructor that produces FileSystem
> objects
> > that have a "get" or "request" method, now it's a reusable object...
> >
> >
> > var fs = new FileSystem();
> >
> > // Save some downloaded data into a new file:
> > fs.request().then(function(root) {
> >   root.createFile("myfile.txt", { data: xhr.response });
> > });
>
> (new Filesystem()).request() looks less nice to me than
> Filesystem.request(). But again, I can live with either.
>

I completely agree.


>
> > I don't know how set in stone the naming is, but you might also consider
> > reviewing some prior art (http://nodejs.org/api/fs.html) for method
> names
> > call signatures.
>
> Nothing is set in stone at this point.
>

Good to know!

Rick
Received on Friday, 9 August 2013 23:35:45 UTC