W3C home > Mailing lists > Public > www-international@w3.org > October to December 2009

RE: Article for wide review: Choosing a language tag

From: Phillips, Addison <addison@amazon.com>
Date: Fri, 9 Oct 2009 18:27:20 -0700
To: John Cowan <cowan@ccil.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <C7A5719F1E562149BA9171F58BEE2CA412980D66F8@EX-IAD6-B.ant.amazon.com>
> In Decision 1, the section on macrolanguages should explain that
> macrolanguage subtags may be used in the same way, and with the
> same considerations, as collection subtags.  

This is correct, insofar as it goes. The one exception would be the extended language subtag collection. In some cases, the macrolanguage (for example, 'zh' Chinese) is preferable to using the predominant encompassed language's subtag (in this case, 'cmn' for Mandarin Chinese), except in cases where specifying 'cmn' would be preferable.

An example is probably useful. 

For setting the preferences on your web browser, you probably want to use ranges that start with 'zh' if you prefer content in Chinese, instead of adopting tags starting with the subtag 'cmn', even though Mandarin Chinese is the most common form, possibly adding ranges beginning with 'yue' if you also prefer to receive Cantonese content. Similarly, you would tag your blog with tags like "zh-Hans-CN" or "zh-Hant-TW".

You might, by contrast, use 'cmn' as your primary language subtag if your task is cataloging various passages of text in different Sinitic languages to share with linguistic scholars who will expect and appreciate the distinction. Whether you use the 'zh' subtag in front of 'cmn' is a matter of personal preference and utility.

> a) It's a mistake to conflate "[p]rimary language subtags that can
> be used with extended language subtags" with macrolanguages.  Only six
> ('ar', 'kok', 'ms', 'sw', 'uz', 'zh') of the 58 current
> macrolanguages,
> plus 'sgn', fall into this category.  Find or invent another term
> to avoid confusion.

I agree. I think some of us have gotten into the habit of talking about the macrolanguage-flavored extlangs as if they described the category. They do not. Probably the best name for this are: "extended language subtags" :-).

Note that the RFC's introduction to this topic (section 2.2.2) is pretty clear:

Extended language subtags are used to identify certain specially selected languages that, for various historical and compatibility reasons, are closely identified with or tagged using an existing primary language subtag. Extended language subtags are always used with their enclosing primary language subtag (indicated with a 'Prefix' field in the registry) when used to form the language tag. All languages that have an extended language subtag in the registry also have an identical primary language subtag record in the registry. This primary language subtag is RECOMMENDED for forming the language tag.

Note in particular that these are "specially selected languages". That is quite literally the qualification for being an extlang---that the IETF LTRU WG chose them (and no others).

The sign languages in particular are illustrative: they are treated as if 'sgn' were a macrolanguage, in that they appear both independently as primary language subtags and as extlangs of 'sgn'---and 'sgn' is emphatically not a macrolanguage.

> In Decision 4, for "regional areas", which is vague, read "regions
> containing more than one country" or "multinational regions".



Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Received on Saturday, 10 October 2009 01:27:53 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:30 UTC