Re: I18N issue: case-sensitivity of locale subdirectories

On Wed, Apr 29, 2009 at 7:16 AM, Robin Berjon <robin@berjon.com> wrote:
> Hi,
>
> the following issue has cropped up in the I18N model as described in the
> excellent I18N document from Marcos[0].
>
> Assume we have two localisation subdirectories:
>
>  locales/en/
>  locales/EN/
>
> What happens? BCP47 (which we reference) is defined to be case-insensitive
> so it doesn't help us much in this respect.
>
> There are multiple options:
>
>  a) we define a canonical casing and all others are ignored;
>  b) we select an order of priority and we only consider one (the first to
> match);
>  c) we select an order of priority and we merge them all (in that order,
> with a given precedence rule);
>  d) the device on which the user agent is catches fire.
>
> I think that (a) should be ruled out because as BCP47 tells us, ISO639-1
> recommends lowercase (language codes), ISO3166-1 recommends uppercase
> (country codes), and ISO15924 recommends titlecase (script codes). These are
> different, but likely to be confusing, and I don't think that developers
> should have to worry about that.
>
> I'd like to reject (d) as out of line with our design preferences.
>
> I don't have a strong opinion on this, but I do I have a preference for a
> rule based on (b): if multiple locale subdirectories have the same
> case-insensitive name, then the one that comes first in ASCII-code order
> (e.g. in order: EN, En, eN, en) is used and the others are ignored.
>
> The argument in favour of only using one is that we already have to merge
> multiple directories, and adding one merge operation for what is in all
> probability a user error seems like too much complexity for little value
> (I'm happy to be contradicted by implementers however). Picking ASCII-code
> order is based on the fact that the directory names must be ASCII here (the
> others must be discarded), and picking the first is arbitrary.

I strongly agree. (c) would add a lot of code that would likely never
be used which means is bad because dead code is always bad, and also
because if in the rare case it is actually used, it's much more likely
to be buggy.

I don't have an opinion on which of the two directories should take
priority, as long as it's one of them. I'd probably argue for using
the first in ASCII-code order since that seems the simplest to
implement, but I'm open to other suggestions.

/ Jonas

Received on Friday, 1 May 2009 06:42:50 UTC