- From: Eric U <ericu@google.com>
- Date: Wed, 11 May 2011 17:02:35 -0700
- To: timeless <timeless@gmail.com>
- Cc: Jonas Sicking <jonas@sicking.cc>, Glenn Maynard <glenn@zewt.org>, Web Applications Working Group WG <public-webapps@w3.org>, Charles Pritchard <chuck@jumis.com>, Kinuko Yasuda <kinuko@google.com>
On Wed, May 11, 2011 at 4:47 PM, timeless <timeless@gmail.com> wrote: > On Thu, May 12, 2011 at 2:08 AM, Eric U <ericu@google.com> wrote: >> Timeless replied: >>> no, if the api is case insensitive, then it's case insensitive >>> *everywhere*, both on Turkish and on English systems. Things could >>> only be case sensitive when serialized to a real file system outside >>> of the API. I'm not proposing a case insensitive system which is >>> locale aware, i'm proposing one which always folds. >> >> You're proposing not just a case-insensitive system, but one that forces e.g. an >> English locale on all users, even those in a Turkish locale. I don't think >> that's an acceptable solution. > > No, I proposed case preserving. If the file is first created with a > dotless i, that hint is preserved and a user agent could and should > retain this (e.g. for when it serializes to a real file system). I'm > just suggesting not allowing an application to ask for distinct dotted > and dotless instances of the same approximate file name. There's a > reasonable chance that case collisions will be disastrous when > serialized, thus it's better to prevent case collisions when an > application tries to create the file - the application can accept a > suggested filename or generate a new one. There are a few things going on here: 1) Does the filesystem preserve case? If it's case-sensitive, then yes. If it's case-insensitive, then maybe. 2) Is it case-sensitive? If not, you have to decide how to do case folding, and that's locale-specific. As I understand it, Unicode case-folding isn't locale specific, except when you choose to use the Turkish rules, which is exactly the problem we're talking about. 3) If you're case folding, are you going to go with a single locale everywhere, or are you going to use the locale of the user? 4) [I think this is what you're talking about w.r.t. not allowing both dotted and dotless i]: Should we attempt to detect filenames that are /too similar/ for some definition of /too similar/, ostensibly to avoid confusing the user. As I read what you wrote, you wanted: 1) yes 2) no 3) a new locale in which I, ı, I and i all fold to the same letter, everywhere 4) yes, possibly only for the case of I, ı, I and i 4 is, in the general case, impossible. It's not well-defined, and is just as likely to cause problems as solve them. If you *just* want to check for ı vs. i, it's possible, but it's still not clear to me that what you're doing will be the correct behavior in Turkish locales [are there any Turkish words, names abbreviations, etc. that only differ in that character?] and it doesn't matter elsewhere.
Received on Thursday, 12 May 2011 00:05:38 UTC