Re: [IndexedDB] Spec changes for international language support

On Fri, Feb 18, 2011 at 2:34 AM, Bjoern Hoehrmann <> wrote:

> * Pablo Castro wrote:
> >We discussed international language support last time at the TPAC and I
> >said I'd propose spec text for it. Please find the patch below, the
> >changes mirror exactly the proposal described in the bug we have for
> >tracking this:
> You should anticipate objections to that; collation is not a property of
> language, for instance, for de-de you typically have dictionary sorting
> and phone book sorting (and of course you have "de-de", "de-ch", and so
> on, so "de" alone would be rather meaningless). So far the W3C and the
> IETF have used resource identifiers to specify collations (see XPath 2.0
> and RFC 4790) where the IETF allows shorthands like "i;ascii-casemap".

I agree that simply specifying that 'language' be used without saying what
it means is not sufficient. However, your examples (German phonebook vs
dictionary) can be covered with language identifier framework laid out in
BCP47 (with 'u' extension).

> I do understand that Microsoft uses an extension of language tags for
> the `CultureInfo` in the .NET Framework, where, say, `de-DE_phoneb` is
> used to refer to german phone book sorting, but BCP 47 does not allow
> for that,

There's a way to specify alternate sorting orders (e.g. German phonebook,
Chinese pinyin, stroke count, radical-stroke count order, etc) under the BCP
47 framework because it has a mechanism for defining an extension and
registering it. The Unicode consortium uses that mechanism to define 'u'
extension and a set of subtags that can be used with 'u'.
For instance, German phonebook sorting can be identified with
'de-DE-u-co-phonebk'. See

Also, see Bug 9903 comment 6 by Mark
Davis<> for
more examples. Well, I'm just copying his comment directly here:

To add to what Jungshik said, BCP47 defines standard extensions. The extension
defined by the Unicode consortium
( provides for fine-grained
specifications of collation behavior.
Examples for German:
de-u-co-phonebk // phonebook order
de-u-kn-true // numeric sorting, eg Tom2 comes before Tom12
de-u-ks-level1 // ignore accents, case differences
de-u-ks-level2 // ignore case differences
de-u-ks-level1-kc-true // ignore accents, but not case
These can be combined, such as:

> neither could you devise a language tag to define something
> like "i;ascii-casemap" (which simply defines A-Z = a-z).

> I would expect that if browsers offer collations, there would be an in-
> terface for that so you can use them in other places, as such it might
> be wiser to accept something other than a language identifier string.

There's an on-going effort to expose a 'rich' set of I18N API to client-side
development using Javascript ( : The API used be
much more extensive than now, but has been scaled down significantly to get
more browsers on board in its 1st iteration). There we're likely to use BCP
47 with 'u' extension (see above). So, I think it'd be better if IndexedDB
matches what ECMAScript plans to do.


> As
> above, URIs, or RFC 4790 values plus URIs, or, in anticipation of some
> such interface, some other object, might be a better choice. And the
> method and attribute should probably not use "language" in their names.
> I also note that collation often involves equivalence testing, but it
> is not clear from your proposal whether that is the case here. It might
> also be a good idea to clearly spell out interoperability expectations;
> if two implementations support some collation, will they behave the same
> for any and all inputs as far as collation is concerned, or should one
> be prepared for slight differences among implementations?
> --
> Björn Höhrmann · ·
> Am Badedeich 7 · Telefon: <%2B49%280%29160%2F4415681>+49(0)160/4415681 ·
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 ·

Received on Tuesday, 22 February 2011 22:08:33 UTC