RE: Language ranges with more than two sub-tag

Hello Marcos,

First the question (which, although Martin has provided some answers, I want to reiterate): language tags (and thus language ranges) with more than two subtags are not uncommon. BCP 47 was designed to allow for a variety of different uses for subtags, not the least of which are script subtags such as the ones Martin cited in his reply. In addition to Chinese, several other languages have script variations, with different kinds of script variation. For example, Serbian can be written in either Latin or Cyrillic script.

If Firefox OS works as described, it would create problems with organizing localized materials for languages that use multiple scripts. There can only be one set of resources, for example, that inhabit "zh"---they can be either Traditional Chinese or Simplified Chinese, but cannot effectively be both.

Another use of additional subtags are for variants and a number of variants have been registered for specific purposes in the last few years. There are other articles about language tag choice on the W3C-I18N site. See [1][2] (and I'm sure you can find more).

But the biggest contributors to additional subtags are the two extensions that have been created. One is for transliterations and transformations (which may be of interest to an application). The other is for locale identifiers (which is obviously of interest to an application!). In addition, JavaScript itself now has a locale model (which includes locale negotiation) and I would most definitely recommend that you look closely at it. It incorporates the locale extension. See Norbert's note on this list for a link to the most recent version: [3]. It would make the most sense for locale-selection and language-negotiation to work in lockstep, especially as browser vendors are working on implementations.

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

[1] http://www.w3.org/International/questions/qa-choosing-language-tags

[2] http://www.w3.org/International/articles/language-tags/ 
[3] http://lists.w3.org/Archives/Public/www-international/2013JanMar/0304.html 

> -----Original Message-----
> From: Marcos Caceres [mailto:w3c@marcosc.com]
> Sent: Friday, March 01, 2013 12:47 AM
> To: www-international@w3.org
> Subject: Language ranges with more than two sub-tag
> 
> Hi Internationalization WG,
> 
> Quick question: how common are language ranges with more than two sub-
> tags (as used in user agents on the Web)? I'm wondering what the particular
> locales (language, countries, regions) are where these ranges are commonly
> used (if any)?
> 
> I've been through "Setting language preferences in a browser" [1], which only
> speaks of language ranges that contain two sub tags. I've also tried doing my
> own testing on various user agents and system settings can can only find the
> "language-COUNTRY" convention, but not any with three sub tags.
> 
> The reason for the question is that the SysApp's working group is currently
> working on a manifest format for web applications (based on the upcoming
> Firefox OS), and it needs to define an internationalization model. Firefox OS
> currently checks for localised content based on a complete language range
> (e.g., "en-US") and, if it can't find any matching content, it simply takes the
> language part of the language range (i.e., "en") and uses that to try to find
> matching content. This means that if there are any commonly used language
> ranges with three or more sub tags, matching could potentially be done
> incorrectly.
> 
> See [2] for a list of use cases/examples.
> 
> Kind regards,
> Marcos
> 
> [1] http://www.w3.org/International/questions/qa-lang-priorities.en.php

> [2] https://gist.github.com/marcoscaceres/5055717

> --
> Marcos Caceres
> http://datadriven.com.au

> 
> 

Received on Friday, 1 March 2013 17:00:41 UTC