Re: file-system-api: filename restrictions from Eric Uhrhane on 2011-01-11 (public-webapps@w3.org from January to March 2011)

From: Eric Uhrhane <ericu@google.com>
Date: Tue, 11 Jan 2011 14:33:16 -0800
To: Glenn Maynard <glenn@zewt.org>
Cc: public-webapps@w3.org
Message-ID: <AANLkTinrB_npemRSW9+uVD0=jdAMKdQqD0ibXmRBd76T@mail.gmail.com>
Glenn:

Sorry about the slow response; I was on vacation, and am only now catching up.

We've discussed these issues before, see
http://lists.w3.org/Archives/Public/public-device-apis/2010Jan/0229.html
for much of the initial discussion.  However, you've brought up a new
point that I think is worth addressing.

On Sun, Dec 19, 2010 at 11:26 AM, Glenn Maynard <glenn@zewt.org> wrote:
> Section 8 "Uniformity of interface" will cause headaches for some use
> cases.  For example, an application may want to allow the user to fill
> a directory with images, then output a thumbnail of each image "x.jpg"
> into a subdirectory with the same filename, "thumbs/x.jpg".
>
> However, we're forbidden from creating a new file with "invalid"
> filenames, even if they exist elsewhere.  The operation will fail, and
> we'll have to tell our Linux users with images named "at the beach:
> moon rock?.jpg" that they have to obey Windows filename
> conventions--which will probably be upsetting.  It'd also be a
> difficult rule for users to follow; while it's easy in Windows since
> it's globally enforced in all applications, Linux users would have to
> memorize the rules themselves.

Actually, it's not just that Linux users now have to worry about
Windows rules; Windows users also have to worry about Linux rules, in
particular the path length limitation, which is 255 bytes on Linux but
255 UTF-16 code points on Windows.

> It's also a pain for backing up files, eg. copying "moon rock?.jpg" to
> "moon rock?.jpg~", and for "safe writes", writing to "moon
> rock?.jpg.new" and then renaming the finished file over the original.
>
> These seem like bigger problems than the one it's trying to solve.  Is
> it really insufficient for these rules to define what filenames must
> be supported, that any others may not be, and to suggest a UA log if
> nonportable filenames are created?  (Of all filename issues, the only
> one that I've ever found to be a serious real-world portability issue
> is case-insensitivity.)

Yes, I believe that's insufficient.  We've discussed this before, and
1) We really do want a fully-portable subset to be the standard; code
should work everywhere if it works anywhere.  You shouldn't have to
code to OSX any more than you should have to code to Opera--just code
to the web platform.
2) Developers often don't read UA logs.  We should fail early on the
dev box, rather than failing later on the user's machine.

> I guess there are other issues with reading data created outside of the API:
>
> - filenames that can't be decoded to a DOMString, eg. undecodable
> bytes in a UTF-8 filesystem.  This is common in Linux after eg.
> unzipping a ZIP containing SJIS filenames.  Should these simply be
> ignored with a log?

I'm looking into the encoding problems now, and will respond later.
In general we should be able to read any such file already, at the
very least by enumerating the directory to get the FileEntry, but
creating files with valid names may be tricky.

> - existing filenames that differ only by case.  Similarly, should the
> UA just ignore all but one of them and make a log to the console?

There's no problem accessing those through directory enumeration, or
via a supplied path.  You just can't create this situation using the
API.

> Should "whitespace" in section 8.3 simply indicate space, U+0020?

It looks like it should; my mistake.  Thanks!

> Windows does allow creating filenames ending with NBSP and other
> Unicode whitespace characters, and it's not clear whether this should
> be allowed.  Other whitespace (\r, \n, \t) is covered by the control
> character rule.
>
> Sorry if this is a rehash of past topics.

The API is designed as it is to support a couple of different
situations, only one of which is currently specced, but both of which
have been discussed.  What's specced so far is a per-origin sandbox
that web apps can use for client-side storage.  Depending on the UA's
implementation of it, it's possible that the files stored there will
be exposed to the host machine and potentially shared with apps
outside of the browser, but we generally expect the browser to create
most or all of them.  Thus it makes sense to take a
least-common-denominator [LCD] approach, so that code that works on
any platform works on all platforms.  If other apps create files there
we should be able to access them no matter what, but things will go
more smoothly if said apps respect our restrictions.

However, an obvious expansion of this API which we've talked a lot
about is the ability to expose other "mount points" to the browser.
For example, a trusted app might be granted access to "My Photos" or
another similar directory.  There the majority of the files are
expected to be created by apps outside of the browser, and you run
into the thumbnail problem you describe above, where a
read-modify-write of a path or even a copy operation can inadvertently
create a file path that's banned by the API, but is legal on the host
system.

In a perfect world, I think we'd want all paths that came from the web
app to be LCD-safe, but all paths that came from the host machine to
be permitted.  Since that's not generally detectable by the UA, or
even well-defined in all cases, perhaps we can help developers to
solve the problem manually.  We could offer another API [or just a
flag in the existing APIs] that says "I'm using paths derived from the
local system.  Let me try this, even if it's not LCD-safe]."  I don't
think we want to allow that in the per-origin sandbox defined in the
current spec, but I could see it being quite valuable for other mount
points.  If that sounds reasonable, we can put that in when we spec
this potential future API expansion.

      Eric
Received on Tuesday, 11 January 2011 22:34:04 UTC