Re: [File API: FileSystem] Path restrictions and case-sensitivity

On Wed, May 11, 2011 at 8:13 PM, Eric U <> wrote:

> So it's not locale-sensitive unless it is, but nobody does that
> anyway, so don't worry about it?  I'm a bit uneasy about that in
> general, but Windows not supporting it is a good point.

It's not locale-sensitive at all, unless the one special case, Turkish, is
enabled explicitly.  I think the norm is to ignore Turkish entirely for
purposes of case folding.  (I wasn't even able to find a way to do a
Turkish-enabled case folding with libicu, though the header constant
"U_FOLD_CASE_EXCLUDE_SPECIAL_I" suggests it's in there somewhere.)

Anyone know about Mac or Linux systems?

Native Linux filesystems are case-sensitive, so I'm not sure there's
anything to compare against there.  (glibc itself doesn't have direct
support for case folding, as far as I know; you use a libraries like libicu
for that sort of thing, and libicu does consider "i" == "I" when case
folding, including in Turkish locales.)

> I'm not liking the backslash exception.  It's the only thing that prevents
> > this API from being a complete superset, as far as I can see, of all
> > production filesystems.  Can we drop that rule?  It might be a little
> > surprising to developers who have only worked in Windows, but they'll be
> > surprised anyway, and it shouldn't lead to latent bugs.
> It can't be a complete superset of all filesystems in that it doesn't
> allow forward slash in filenames either.
> However, I see your point.  You could certainly have a filename with a
> backslash in it on a Linux/ext2 system.  Does anyone else have an
> opinion on whether it's worth the confusion potential?

Of all production end-user filesystems--on any systems where they're
allowed, users are going to be used to this being incompatible with the rest
of the world already.

I guess there's one other case where it's not necessarily a superset:
filenames containing invalid byte sequences which can't be represented in
UTF-16.  I do end up with these from time to time, eg. when extracting a ZIP
containing non-UTF-8 filenames.  I think I'm not very worried about this (at
least for the sandbox case)--this is an error recovery case, where
backslashes in filenames are legitimate, if uncommon.

Glenn Maynard

Received on Thursday, 12 May 2011 01:19:14 UTC