W3C home > Mailing lists > Public > public-webapps@w3.org > April to June 2009

Re: I18N issue: case-sensitivity of locale subdirectories

From: <Jere.Kapyaho@nokia.com>
Date: Thu, 7 May 2009 12:33:04 +0200
To: <marcosc@opera.com>, <robin@berjon.com>
CC: <public-webapps@w3.org>
Message-ID: <C6289390.6248%jere.kapyaho@nokia.com>
On 5.5.2009 13.16, "ext Marcos Caceres" <marcosc@opera.com> wrote:
> On Wed, Apr 29, 2009 at 4:16 PM, Robin Berjon <robin@berjon.com> wrote:
>> Assume we have two localisation subdirectories:
>> 
>>  locales/en/
>>  locales/EN/
>> 
>> What happens? BCP47 (which we reference) is defined to be case-insensitive
>> so it doesn't help us much in this respect.
>> 
>> There are multiple options:
>> 
>>  a) we define a canonical casing and all others are ignored;
>>  b) we select an order of priority and we only consider one (the first to
>> match);
>>  c) we select an order of priority and we merge them all (in that order,
>> with a given precedence rule);
>>  d) the device on which the user agent is catches fire.
>> 
>> I think that (a) should be ruled out because as BCP47 tells us, ISO639-1
>> recommends lowercase (language codes), ISO3166-1 recommends uppercase
>> (country codes), and ISO15924 recommends titlecase (script codes). These are
>> different, but likely to be confusing, and I don't think that developers
>> should have to worry about that.
> 
> Agreed.

Because BCP47 is indeed case-insensitive [1], both "en" and "EN" (and also
"eN" and "En") are considered equivalent. While it is probably an oversight
or error to provide several variants of the same language tag with different
character case anyway, they need to be considered somehow because they *are*
equivalent, unless it is made explicit that this is an error in the
packaging.

The path inside the widget's ZIP file is already defined as
case-insensitive, so it is actually already an error to have two or more
folders with names that differ only by character case. Even if some
implementation unzips the content of the widget to a local filesystem, we
have no control over whether filenames in that filesystem are
case-insensitive or case-sensitive.

>> I don't have a strong opinion on this, but I do I have a preference for a
>> rule based on (b): if multiple locale subdirectories have the same
>> case-insensitive name, then the one that comes first in ASCII-code order
>> (e.g. in order: EN, En, eN, en) is used and the others are ignored.
> 
> This seems reasonable. I will add this.

I suggest that the widget packaging rules say that any localized folders
must be unique in terms of a case-insensitive match, otherwise the packaging
is invalid [2]. This also allows us to not talk about ASCII code ordering.
Furthermore, there is then no need to merge the contents of such folders.

For the degenerate (but unfortunately unavoidable) case where someone has
managed to slip in two or more such folders, define a canonical casing
(obvious suggestion: lowercase) and use it, then simply ignore any others.

>> The argument in favour of only using one is that we already have to merge
>> multiple directories, and adding one merge operation for what is in all
>> probability a user error seems like too much complexity for little value
>> (I'm happy to be contradicted by implementers however). Picking ASCII-code
>> order is based on the fact that the directory names must be ASCII here (the
>> others must be discarded), and picking the first is arbitrary.
>> 
>> Thoughts?
> 
> I support b. Added some of your text above to the spec.

I guess none of a)-d) really fit my observations as such. It's more like
additional packaging rules + shades of a).

Note that for comparisons with the widget locale value you still need to
case-fold [3] everything anyway. There is no guarantee that the widget
locale matches any localized subfolder name as such, because the widget
locale itself could use capitalization that really carries no meaning, but
fails to match any localized folder unless you do a case-insensitive
comparison. In this case the comparison can be also language-insensitive,
because BCP47 language tags consist of US-ASCII characters.

Hope this helps,
Jere

[1] http://tools.ietf.org/html/bcp47#section-2.1
[2] http://dev.w3.org/2006/waf/widgets/#invalid-widgets
[3] http://www.w3.org/International/wiki/Case_folding

>> [0]http://dev.w3.org/cvsweb/~checkout~/2006/waf/widgets/i18n.html?rev=1.29&co
>> ntent-type=text/html;%20charset=utf-8
>> 
>> --
>> Robin Berjon - http://berjon.com/
>>    Feel like hiring me? Go to http://robineko.com/
>> 

> --
> Marcos Caceres
> http://datadriven.com.au

-- 
Jere Käpyaho (jere.kapyaho@nokia.com)
Specialist, Developer Platforms Standardization
Devices R&D, Nokia Corporation
Received on Thursday, 7 May 2009 10:33:55 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:31 GMT