W3C home > Mailing lists > Public > public-webapps@w3.org > April to June 2011

Re: [File API: FileSystem] Path restrictions and case-sensitivity

From: Eric U <ericu@google.com>
Date: Wed, 11 May 2011 17:13:46 -0700
Message-ID: <BANLkTi=_6V4+=2FUfEbB61HmKfeORytp_Q@mail.gmail.com>
To: Glenn Maynard <glenn@zewt.org>
Cc: Jonas Sicking <jonas@sicking.cc>, timeless <timeless@gmail.com>, Web Applications Working Group WG <public-webapps@w3.org>, Charles Pritchard <chuck@jumis.com>, Kinuko Yasuda <kinuko@google.com>
On Wed, May 11, 2011 at 4:52 PM, Glenn Maynard <glenn@zewt.org> wrote:
> On Wed, May 11, 2011 at 7:08 PM, Eric U <ericu@google.com> wrote:
>>
>> > *everywhere*, both on Turkish and on English systems. Things could
>> > only be case sensitive when serialized to a real file system outside
>> > of the API. I'm not proposing a case insensitive system which is
>> > locale aware, i'm proposing one which always folds.
>>
>> > no, if the api is case insensitive, then it's case insensitive
>> You're proposing not just a case-insensitive system, but one that forces
>> e.g. an
>> English locale on all users, even those in a Turkish locale.  I don't
>> think
>> that's an acceptable solution.
>>
>> I also don't think having code that works in one locale and not another
>> [Glenn's "image.jpg" example] is fantastic.  It was what we were stuck
>> with when
>> I was trying to allow implementers the choice of a pass-through
>> implementation,
>> but given that that's fallen to the realities of path lengths on Windows,
>> I feel
>> like we should try to do better.
>
> To clarify something which I wasn't aware of before digging into this
> deeper: Unicode case folding is *not* locale-sensitive.  Unlike lowercasing,
> it uses the same rules in all locales, except Turkish.  Turkish isn't just
> an easy-to-explain example of one of many differences (as it is with Unicode
> lowercasing); it is, as far as I see, the *only* exception.  Unicode's case
> folding rules have a special flag to enable Turkish in case folding, which
> we can safely ignore here--nobody uses it for filenames.  (Windows filenames
> don't honor that special case on Turkish systems, so those users are already
> accustomed to that.)

So it's not locale-sensitive unless it is, but nobody does that
anyway, so don't worry about it?  I'm a bit uneasy about that in
general, but Windows not supporting it is a good point.  Anyone know
about Mac or Linux systems?

> That said, it's still uncomfortable having a dependency on the Unicode
> folding table here: if it ever changes, it'll cause both interop problems
> and data consistency problems (two files which used to be distinct filenames
> turning into two files with the same filenames due to a browser update
> updating its Unicode data).  Granted, either case would probably be
> vanishingly rare in practice at this point.

Agreed [both in the discomfort and the rarity], but I think it's a
very ugly dependency anyway.

> All that aside, I think a much stronger argument for case-sensitive
> filenames is the ability to import files from essentially any environment;
> this API's filename rules are almost entirely a superset of all other
> filesystems and file containers.  For example, sites can allow importing
> (once the needed APIs are in place) directories of data into the sandbox,
> without having to modify any filenames to make it fit a more constrained
> API.  Similarly, sites can extract tarballs directly into the sandbox.
> (I've seen tars containing both "Makefile" and "makefile"; maybe people only
> do that to confound Windows users, but they exist.)

I've actually ended up in that situation on Linux, with tools that
autogenerated makefiles, but were run from Makefiles.  It's not a
situation I really wanted to be in, but it was nice that it actually
worked without me having to hack around it.

> I'm not liking the backslash exception.  It's the only thing that prevents
> this API from being a complete superset, as far as I can see, of all
> production filesystems.  Can we drop that rule?  It might be a little
> surprising to developers who have only worked in Windows, but they'll be
> surprised anyway, and it shouldn't lead to latent bugs.

It can't be a complete superset of all filesystems in that it doesn't
allow forward slash in filenames either.
However, I see your point.  You could certainly have a filename with a
backslash in it on a Linux/ext2 system.  Does anyone else have an
opinion on whether it's worth the confusion potential?

>> Glenn:
>> > This can be solved at the application layer in applications that want
>> > it, without baking it into the filesystem API.
>>
>> This is mostly true; you'd have to make sure that all alterations to the
>> filesystem went through a single choke-point or you'd have the potential
>> for
>> race conditions [or you'd need to store the original-case filenames
>> yourself,
>> and send the folded case down to the filesystem API].
>
> Yeah, it's not necessarily easy to get right, particularly if you have
> multiple threads running...
>
>
>
> (The rest was Charles, by the way.)

Ah, sorry Glenn and Charles.

>> > A virtual FS as the backing for the filesystem API does not resolve that
>> > core
>> > issue.  It makes sense to encourage authors to gracefully handle errors
>> > thrown
>> > by  creating files and directories.  Such a need has already been
>> > introduced
>> > via Google Chrome's unfortunate limitation of a 255 byte max path
>> > length.
>
>
> --
> Glenn Maynard
>
>
>
Received on Thursday, 12 May 2011 00:16:46 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:45 GMT