Re: [File API: FileSystem] Path restrictions and case-sensitivity from Glenn Maynard on 2011-05-11 (public-webapps@w3.org from April to June 2011)

From: Glenn Maynard <glenn@zewt.org>
Date: Wed, 11 May 2011 19:52:38 -0400
To: Eric U <ericu@google.com>
Cc: Jonas Sicking <jonas@sicking.cc>, timeless <timeless@gmail.com>, Web Applications Working Group WG <public-webapps@w3.org>, Charles Pritchard <chuck@jumis.com>, Kinuko Yasuda <kinuko@google.com>
Message-ID: <BANLkTik8f7ODuy3YrdrZpgeCcg3KKCp59Q@mail.gmail.com>

On Wed, May 11, 2011 at 7:08 PM, Eric U <ericu@google.com> wrote:

> > *everywhere*, both on Turkish and on English systems. Things could
> > only be case sensitive when serialized to a real file system outside
> > of the API. I'm not proposing a case insensitive system which is
> > locale aware, i'm proposing one which always folds.
>
> > no, if the api is case insensitive, then it's case insensitive
> You're proposing not just a case-insensitive system, but one that forces
> e.g. an
> English locale on all users, even those in a Turkish locale.  I don't think
> that's an acceptable solution.
>
> I also don't think having code that works in one locale and not another
> [Glenn's "image.jpg" example] is fantastic.  It was what we were stuck with
> when
> I was trying to allow implementers the choice of a pass-through
> implementation,
> but given that that's fallen to the realities of path lengths on Windows, I
> feel
> like we should try to do better.
>

To clarify something which I wasn't aware of before digging into this
deeper: Unicode case folding is *not* locale-sensitive.  Unlike lowercasing,
it uses the same rules in all locales, except Turkish.  Turkish isn't just
an easy-to-explain example of one of many differences (as it is with Unicode
lowercasing); it is, as far as I see, the *only* exception.  Unicode's case
folding rules have a special flag to enable Turkish in case folding, which
we can safely ignore here--nobody uses it for filenames.  (Windows filenames
don't honor that special case on Turkish systems, so those users are already
accustomed to that.)

That said, it's still uncomfortable having a dependency on the Unicode
folding table here: if it ever changes, it'll cause both interop problems
and data consistency problems (two files which used to be distinct filenames
turning into two files with the same filenames due to a browser update
updating its Unicode data).  Granted, either case would probably be
vanishingly rare in practice at this point.

All that aside, I think a much stronger argument for case-sensitive
filenames is the ability to import files from essentially any environment;
this API's filename rules are almost entirely a superset of all other
filesystems and file containers.  For example, sites can allow importing
(once the needed APIs are in place) directories of data into the sandbox,
without having to modify any filenames to make it fit a more constrained
API.  Similarly, sites can extract tarballs directly into the sandbox.
(I've seen tars containing both "Makefile" and "makefile"; maybe people only
do that to confound Windows users, but they exist.)

I'm not liking the backslash exception.  It's the only thing that prevents
this API from being a complete superset, as far as I can see, of all
production filesystems.  Can we drop that rule?  It might be a little
surprising to developers who have only worked in Windows, but they'll be
surprised anyway, and it shouldn't lead to latent bugs.

Glenn:
> > This can be solved at the application layer in applications that want
> > it, without baking it into the filesystem API.
>
> This is mostly true; you'd have to make sure that all alterations to the
> filesystem went through a single choke-point or you'd have the potential
> for
> race conditions [or you'd need to store the original-case filenames
> yourself,
> and send the folded case down to the filesystem API].
>

Yeah, it's not necessarily easy to get right, particularly if you have
multiple threads running...

(The rest was Charles, by the way.)

> A virtual FS as the backing for the filesystem API does not resolve that
> core
> > issue.  It makes sense to encourage authors to gracefully handle errors
> thrown
> > by  creating files and directories.  Such a need has already been
> introduced
> > via Google Chrome's unfortunate limitation of a 255 byte max path length.
>

-- 
Glenn Maynard

Received on Wednesday, 11 May 2011 23:53:05 UTC