Re: FileSystem API Comments from Jonas Sicking on 2014-10-22 (public-webapps@w3.org from October to December 2014)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 21 Oct 2014 18:36:11 -0700
To: Ali Alabbas <alia@microsoft.com>
Cc: WebApps WG <public-webapps@w3.org>
Message-ID: <CA+c2ei8xNgFZETJNZH2HXK=WTeknWNysBZzkoOgoU+dsyiQ4xg@mail.gmail.com>
On Tue, Oct 21, 2014 at 1:36 PM, Ali Alabbas <alia@microsoft.com> wrote:
> Hello,
>
> I'm with the IE Platform team at Microsoft. We have a few comments on the
> latest editor's draft of the newly proposed FileSystem API [1].

Hi Ali! Thanks for looking at the spec. I'm glad you've been able to
understand the API as well as you have, despite it's "lacking"
descriptions of the API. To say the least :)

> 1.1 Use cases (3. Audio/Photo editor with offline access or local cache for
> speed)
>
>   * Edited files should be accessible by other client-side applications
>
>      - Having the sandboxed file system share its contents between all apps
> would allow apps to tamper with the files of another app. This could result
> in corrupted files and perhaps an invalid state for some apps that expect
> certain contents to exist in a file. This makes us wonder: should we warn
> users about files that are being opened and written to? If an app is just
> doing a read, can it open a file or directory without the user's permission,
> or could this pose a possible issue as well? Also, is the Quota Management
> API going to be a dependency? It's unclear what we would do with regards to
> requesting permission to access files. Will this spec be responsible for
> defining what questions/permission inquiries are presented and when they are
> presented to the user? For example, what happens when one file is locked for
> use by a different application? Is the user notified and given the option to
> open a read-only copy of that file?

Each origin has a separate sandboxed filesystem. There is no way for
websites to read each other's filesystems. This is no different from
IndexedDB or localStorage. This also means that we have the same
prompting behavior, the same Quota Management dependency and the same
security model as IndexedDB and localStorage.

Different browsers have used various different policies for IndexedDB
and localStorage in the past. Firefox originally prompted as soon as
you used IndexedDB, but we currently instead prompt once you store
more than 50MB. Our plan is to very soon remove all prompts. I think
Safari on iOS prompts once you reach a certain limit of stored data,
in WebSQL, not sure if this applies to localStorage and IndexedDB as
well. Chrome, Opera and IE has never prompted I think.

My hope is that we can agree on some policies for this stuff and
encode that in a Quota Management specification. Once we do that it
should hopefully apply across all storage APIs. I.e. hopefully what
will matter is how much data a website stores, not what API it uses to
store that data.

The "Edited files should be accessible by other client-side
applications" refers to the ability for non-web applications to access
these files. I.e. if the user has iTunes or WinAmp installed then it
might be good for those native applications to be able to scan files
stored by websites to try to find music files stored there. This
technically is no different in IndexedDB. I.e. it'd be cool if
iTunes/WinAmp could find music files stored as Blobs in IndexedDB.

I'm personally happy to remove this requirement as I don't think it
affects the API at all.

> 3. The Directory Interface
>   * Change events
>
>      - I would like to revisit the discussion on apps getting notifications
> of changes to files/directories. There are many scenarios where an
> application would want to react to renames/moves of a file/directory. There
> would also be value in being notified of a change to a directory's
> structure. If an app has a file browser that allows a user to select files
> and/or directories and another app makes changes to the sandboxed
> filesystem, then it would be expected that the first app should be notified
> and would be able to refresh its directory tree. Otherwise it would require
> the user to somehow force a refresh which would not be a good user
> experience since the user would expect the file browser to update on its
> own.

I agree. The lack of change notifications has been a problem for
IndexedDB. The second most common feedback that we get is that change
notifications are needed. Only exceeded by "Why no Promises you
jerkfaces!"

>   * removeDeep() & move()
>
>      - Do these support links or junctions? If not, what is the expected
> behavior?

Links and junctions aren't supported in the API at all right now. I.e.
there's no way to create a link/junction, and so the problem of what
happens if you delete/move them does not appear.

>   * enumerate()
>
>      - It would be useful to have pre-filtering support for the following:
> file/directory, ranges, wildcard search. Currently we would be forced to
> enumerate the entire set and manually filter out the items we want from the
> result set.

This is what native filesystems do too. I worry that it's hard to find
common patterns here. I.e. is filtering on name more important than
extension? What about file size? What about last-modified-date? What
about name and last-modified-date?

>          - Callers often know exactly whether or not they want to enumerate
> files or folders. For example, an image upload service may only be
> interested in the files present in a directory rather than all of its
> directories. Perhaps it would be useful to have enumerateFiles() and
> enumerateDirectories() for this purpose? Or we could have another argument
> for enumerate() that is an enum ("directory", "file").

Do you expect to be able to provide any significant performance
benefits from this? Or is it just to avoid writing more JS code? Note
that whatever "async enumeration" solution we use (there's discussions
about Observables vs. Streams in TC39 right now) will most likely
provide simple-to-use primitives for doing filtering. So I don't
expect that much code would need to be written for the website.

>          - Supporting optimized pagination of large directories. We could
> have arguments for a starting index and length we would be able to specify a
> range of items to retrieve within the result set.

How do we do this while also ensuring that you don't get
unpredictable, confusing or inconsistent results if a directory is
modified halfway through enumeration?

>          - Supporting the wildcard character to pre-filter a list of
> files/directories (e.g. *.jpg).

See above.

> 4. The FileHandle Interface
>
>   * FileHandles
>
>      - Is this basically going to be the first to get the handle gets to use
> it and all subsequent calls need to wait for the file handle to become
> available again?

Yes. The first one to open a FileHandle gets to use it as much as it
wants while other openers wait. Once the FileHandle is closed the next
opener will receive a new FileHandle instance and gets to use that
until it's closed.

> Are there more details about the locking model used here?

I believe Arun recently picked up this spec again to work on. So yes,
there will be more details in the spec.

>   * Auto-closing of FileHandles
>
>      - This may cause confusion as it does not match the common developer
> mental model of a file handle which is "opened" and then available for use
> until it's "closed". Perhaps it would be advantageous to have an explicit
> close function as part of the FileHandle interface? With the current
> behavior there can be overhead with the unintended closure of the FileHandle
> that would require a developer to continuously open/close a FileHandle. The
> currently defined behavior assumes that a developer is done with all their
> file manipulations when they have completed a promise chain. However, a
> developer may want to keep the FileHandle open to be used elsewhere at some
> other point in time that is not related to the current promise chain. An
> example of the usefulness of having an explicit close function is if you
> were to implement a word processor and wanted to lock down the file that it
> currently has open for the period of its editing. This way you are free to
> continue operating on that file for the duration that it is open, protecting
> the file from other processes, and not having to undergo the costly setup
> and teardown of a file handle.

I agree that the current model is a problem. FileHandle is currently
using the same "auto closing" model as IndexedDB transactions.

This absolutely have some big disadvantages as you point out. However
so does having an explicit .close() function. If we have an explicit
.close() function, it gets *very* easy to create websites that
deadlock. Simply having the following two code paths will do that on
occasion:

1. Open file A, open file B, copy some data from A, write the data in
B, close A, close B
2. Open file B, open file A, read some data from A, read some data
from B, close A, close B

The problems here is essentially the same as we had with IndexedDB
transaction. I wrote about it here:

http://lists.w3.org/Archives/Public/public-webapps/2013AprJun/0217.html

>   * AbortableProgressPromise
>
>      - It is not clear how a developer would define the abort callback of an
> AbortableProgressPromise. It seems that the user agent would be responsible
> for setting the abort callback since it instantiates and returns the
> AbortableProgressPromise.

Yes. Since the UA instantiates the AbortableProgressPromise, the UA
would register the callback.

I think AbortableProgressPromise as it's currently specced will go
away and be replaced with something else from TC39. But the general
design with remain. I.e. that you get back some form of Promise, and
that there is the ability to receive progress notifications as well as
stop the operation from continuing to execute.

> 5. The FileHandleWritable Interface
>   * write() & flush()
>
>      - It might be useful to have support for "transacted" streams where the
> caller can write to a copy of the file and then have it atomically replaced:
> swap the old file with the new one and then delete the old file. This
> reduces the opportunity for an app crash to lead to a corrupted file on
> disk; it would be corrupted without transacted streams if the app crashes
> during save. The transacted streams could be handled during the (automated)
> flush where the user agent would write to a temp file and atomically replace
> the intended file.

Yeah, I think adding something like the following would make it much
simpler for webpages to do atomic writes.

partial interface Directory {
  atomicWrite((DOMString or File) path, (DOMString or Blob or
ArrayBuffer or ArrayBufferView) data);
}

This would replace the contents of the file in the first argument with
the data in the second argument.

Implementation-wise this could be done by writing a new file next to
the existing one, and then do an atomic rename to overwrite the
existing file with the new file.

>   * flush()
>
>      - This is costly functionality to expose and is likely to be overused
> by callers. It would be beneficial to automatically flush changes to disk by
> allowing the default file write behavior by the OS. For example, on Windows,
> we would leave it up to the filesystem cache to determine the best time to
> flush to disk. This is non-deterministic from the app's point of view, but
> the only time it is a potential problem is when there's a hard power-off.
> Most apps should not be concerned with this; only apps that have very high
> data reliability requirements would need the granular control of flushing to
> disk. In those cases a developer should use IndexedDB. So we should consider
> obscuring this functionality since it's not a common requirement and has a
> performance impact if it's widely used.

I agree that by default this should be left up to the OS. That's the
intent of the current API.

But it also seems important to enable pages to implement atomic
semantics on top of the filesystem API. I don't think that we can
simply tell people to use IndexedDB if they need atomic semantics
since IndexedDB currently can't do partial writes at all.

I.e. I want people to be able to implement their own transactional
database APIs on top of the web platform. IndexedDB currently doesn't
enable that in a good way since it doesn't support modifying large
Blobs. A filesystem without flush() also wouldn't enable that.

However I'm happy to play with the syntax of flush() to make it less
likely to be abused. Such as rename it to something that makes it
clear that it's a very expensive operation.

> 6. FileSystem Configuration Parameters
>
>   * Dictionary DestinationDict
>
>      - The DestinationDict seems to exist to facilitate the renaming of a
> directory in conjunction with a move. However, the same operation is done
> differently for files which makes the functionality non-uniform. Perhaps we
> can add a rename() function to make it more intuitive?

I'm not sure I understand you here. The way you rename+move a directory is

sourceDir.move(subdir, { dir: destDir, name: "newname" });

The way you do the same for a file is

sourceDir.move(subfile, { dir: destDir, name: "newname" });

So the syntax is exactly the same for moving files and directories.

If you're renaming+moving within the same directory, you can also do

sourceDir.move(subdir, "destDir/subdir/newname" );
sourceDir.move(subfile, "destDir/subdir/newname" );

But again, the syntax for moving+renaming is the same for directories and files.

/ Jonas
Received on Wednesday, 22 October 2014 01:37:09 UTC