Re: [File API: FileSystem] Path restrictions and case-sensitivity

On Wed, May 11, 2011 at 4:47 PM, timeless <timeless@gmail.com> wrote:
> On Thu, May 12, 2011 at 2:08 AM, Eric U <ericu@google.com> wrote:
>> Timeless replied:
>>> no, if the api is case insensitive, then it's case insensitive
>>> *everywhere*, both on Turkish and on English systems. Things could
>>> only be case sensitive when serialized to a real file system outside
>>> of the API. I'm not proposing a case insensitive system which is
>>> locale aware, i'm proposing one which always folds.
>>
>> You're proposing not just a case-insensitive system, but one that forces e.g. an
>> English locale on all users, even those in a Turkish locale.  I don't think
>> that's an acceptable solution.
>
> No, I proposed case preserving. If the file is first created with a
> dotless i, that hint is preserved and a user agent could and should
> retain this (e.g. for when it serializes to a real file system). I'm
> just suggesting not allowing an application to ask for distinct dotted
> and dotless instances of the same approximate file name. There's a
> reasonable chance that case collisions will be disastrous when
> serialized, thus it's better to prevent case collisions when an
> application tries to create the file - the application can accept a
> suggested filename or generate a new one.

There are a few things going on here:

1) Does the filesystem preserve case?  If it's case-sensitive, then
yes.  If it's case-insensitive, then maybe.
2) Is it case-sensitive?  If not, you have to decide how to do case
folding, and that's locale-specific.  As I understand it, Unicode
case-folding isn't locale specific, except when you choose to use the
Turkish rules, which is exactly the problem we're talking about.
3) If you're case folding, are you going to go with a single locale
everywhere, or are you going to use the locale of the user?
4) [I think this is what you're talking about w.r.t. not allowing both
dotted and dotless i]: Should we attempt to detect filenames that are
/too similar/ for some definition of /too similar/, ostensibly to
avoid confusing the user.

As I read what you wrote, you wanted:
1) yes
2) no
3) a new locale in which I, ı, I and i all fold to the same letter, everywhere
4) yes, possibly only for the case of I, ı, I and i

4 is, in the general case, impossible.  It's not well-defined, and is
just as likely to cause problems as solve them.  If you *just* want to
check for ı vs. i, it's possible, but it's still not clear to me that
what you're doing will be the correct behavior in Turkish locales [are
there any Turkish words, names abbreviations, etc. that only differ in
that character?] and it doesn't matter elsewhere.

Received on Thursday, 12 May 2011 00:05:38 UTC