- From: Addison Phillips <addison@yahoo-inc.com>
- Date: Wed, 16 Jan 2008 21:18:32 -0800
- To: I18N <www-international@w3.org>
- CC: member-i18n-core@w3.org
In this week's Internationalization Core WG teleconference, I drew an action item [1] to provide more information about a proposed change to the language tag ABNF (the grammar or formal syntax) in the proposed successor to RFC 4646. That's because the W3C created several documents [2] and [3] at about the time RFC 4646 came into being describing language tags. Parts of these documents speculate about a potential future feature of language tags that is now being removed or will not be used. The I18N Core WG is now preparing to revise this document to keep it current, and, as co-editor of the proposed replacement, I've been following the details closely. As many of you know, RFC 4646 was created as a successor to RFC 3066 as the document defining "BCP 47", the language tagging standard for Internet (and other) technologies. You may know "BCP 47" as "xml:lang" or as the values in the HTTP Accept-Language header, for example. RFC 4646 provided a more complex syntax that defined several new "flavors" of subtag in addition to the language and region subtags that had been formally defined previously. Most of these new types were fully defined in 4646. However, one type of subtag was reserved for future use: the "extended language" subtags, or, colloquially, "extlangs". Extended language subtags were intended to accommodate a feature of ISO 639-3, whereby some languages were considered to be encompassed by existing languages, which were called "macro-languages". For example, Mandarin Chinese and Cantonese are both distinct languages that have their own codes in ISO 639-3 (these are 'cmn' and 'yue' respectively). Both of these languages (with several others) are encompassed by the Macrolanguage called "Chinese", which is represented by the code 'zh' in language tags. At the time 4646 was created, the IETF working group theorized that language tags for these languages would use both the macro- and encompassed language codes together. For example, a Cantonese (yue) document written in the Traditional script (Hant) for Hong Kong (HK) would use a tag like "zh-yue-Hant-HK". However, after a great deal of debate and consideration, it was decided that this extlang feature would NOT be used. The encompassed and macrolanguage codes would both appear as potential primary language subtags and the extended language subtag would not be used. Thus, for example, the document described above would use the tag "yue-Hant-HK". It should be noted that the IETF working group for language tags has also decided to remove the extlang production from the language tag syntax. This production was explicitly reserved for future use and no tags have ever been valid that used it. A few tags were registered during the RFC 3066 era that appear to use these subtags, but these were separately handled by the "grandfathered" productions in the grammar. Removing extlang altogether will simplify writing language tag processors and relex some of the minimum length requirements previously imposed. Finally, this move was not taken without considerable debate and discussion. Some of the macrolanguages are obscure, but Chinese and Arabic languages are among those affected. Those interested in the macrolanguage mapping list can refer to the ISO639-3RA's page showing the current mappings [4]. The proposed successor is now nearing completion. A link to the current draft of the document can be found on my page [5], along with links to the IETF LTRU WG responsible for this document, the mail archive, and so forth. Best Regards, Addison [1] http://www.w3.org/2008/01/16-core-minutes.html#action04 [2] http://www.w3.org/International/articles/bcp47/ [3] http://www.w3.org/International/articles/language-tags/#iana [4] http://www.sil.org/iso639%2D3/macrolanguages.asp [5] http://www.inter-locale.com -- Addison Phillips Globalization Architect -- Yahoo! Inc. Chair -- W3C Internationalization Core WG Internationalization is an architecture. It is not a feature.
Received on Thursday, 17 January 2008 05:18:58 UTC