W3C home > Mailing lists > Public > www-international@w3.org > January to March 2008

RE: Upcoming changes to BCP47 (language tag) syntax

From: Richard Ishida <ishida@w3.org>
Date: Mon, 21 Jan 2008 18:55:52 -0000
To: "'Addison Phillips'" <addison@yahoo-inc.com>, "'I18N'" <www-international@w3.org>
Cc: <member-i18n-core@w3.org>
Message-ID: <006501c85c5f$3f844260$be8cc720$@org>

Thanks for this Addison.

I propose the following change to http://www.w3.org/International/articles/language-tags/#issues

Delete the text currently in the section and replace with...


<p>Work is nearing completion on the next version of BCP 47.  It will incorporate a large number of additional three-letter language subtags from ISO 639-3. Since this was foreseen at the time of publication of RFC 4646, it will involve only small editorial changes.</p>

<p>This article used to mention a plan to introduce an extended-language subtag in the next version of BCP 47.  In the end a decision was taken by the IETF Working Group to not include this feature.  This means, for example, that a language tag for Mandarin should use cmn as a primary tag, rather than using the 'grandfathered' zh-cmn.</p>


It would be good to have the bulk of the text below available in some publicly accessible form so that we can point to it.

Also, I'm wondering what are the implications wrt use of zh and the cmn, yue, etc subtags.  Are we expecting zh to be used only for things like zh-Hans and zh-Hant?  Can we still use it in a vague way to mean 'some kind of chinese'?  Will the zh-cmn etc tags be deprecated? etc.

Cheers,
RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/International/
http://rishida.net/blog/
http://rishida.net/

 

> -----Original Message-----
> From: member-i18n-core-request@w3.org [mailto:member-i18n-core-
> request@w3.org] On Behalf Of Addison Phillips
> Sent: 17 January 2008 05:19
> To: I18N
> Cc: member-i18n-core@w3.org
> Subject: Upcoming changes to BCP47 (language tag) syntax
> 
> 
> In this week's Internationalization Core WG teleconference, I drew an
> action item [1] to provide more information about a proposed change to
> the language tag ABNF (the grammar or formal syntax) in the proposed
> successor to RFC 4646. That's because the W3C created several documents
> [2] and [3] at about the time RFC 4646 came into being describing
> language tags. Parts of these documents speculate about a potential
> future feature of language tags that is now being removed or will not be
> used. The I18N Core WG is now preparing to revise this document to keep
> it current, and, as co-editor of the proposed replacement, I've been
> following the details closely.
> 
> As many of you know, RFC 4646 was created as a successor to RFC 3066 as
> the document defining "BCP 47", the language tagging standard for
> Internet (and other) technologies. You may know "BCP 47" as "xml:lang"
> or as the values in the HTTP Accept-Language header, for example.
> 
> RFC 4646 provided a more complex syntax that defined several new
> "flavors" of subtag in addition to the language and region subtags that
> had been formally defined previously. Most of these new types were fully
> defined in 4646. However, one type of subtag was reserved for future
> use: the "extended language" subtags, or, colloquially, "extlangs".
> 
> Extended language subtags were intended to accommodate a feature of ISO
> 639-3, whereby some languages were considered to be encompassed by
> existing languages, which were called "macro-languages". For example,
> Mandarin Chinese and Cantonese are both distinct languages that have
> their own codes in ISO 639-3 (these are 'cmn' and 'yue' respectively).
> Both of these languages (with several others) are encompassed by the
> Macrolanguage called "Chinese", which is represented by the code 'zh' in
> language tags.
> 
> At the time 4646 was created, the IETF working group theorized that
> language tags for these languages would use both the macro- and
> encompassed language codes together. For example, a Cantonese (yue)
> document written in the Traditional script (Hant) for Hong Kong (HK)
> would use a tag like "zh-yue-Hant-HK".
> 
> However, after a great deal of debate and consideration, it was decided
> that this extlang feature would NOT be used. The encompassed and
> macrolanguage codes would both appear as potential primary language
> subtags and the extended language subtag would not be used. Thus, for
> example, the document described above would use the tag "yue-Hant-HK".
> 
> It should be noted that the IETF working group for language tags has
> also decided to remove the extlang production from the language tag
> syntax. This production was explicitly reserved for future use and no
> tags have ever been valid that used it. A few tags were registered
> during the RFC 3066 era that appear to use these subtags, but these were
> separately handled by the "grandfathered" productions in the grammar.
> 
> Removing extlang altogether will simplify writing language tag
> processors and relex some of the minimum length requirements previously
> imposed.
> 
> Finally, this move was not taken without considerable debate and
> discussion. Some of the macrolanguages are obscure, but Chinese and
> Arabic languages are among those affected. Those interested in the
> macrolanguage mapping list can refer to the ISO639-3RA's page showing
> the current mappings [4].
> 
> The proposed successor is now nearing completion. A link to the current
> draft of the document can be found on my page [5], along with links to
> the IETF LTRU WG responsible for this document, the mail archive, and so
> forth.
> 
> Best Regards,
> 
> Addison
> 
> [1] http://www.w3.org/2008/01/16-core-minutes.html#action04
> [2] http://www.w3.org/International/articles/bcp47/
> [3] http://www.w3.org/International/articles/language-tags/#iana
> [4] http://www.sil.org/iso639%2D3/macrolanguages.asp
> [5] http://www.inter-locale.com
> 
> --
> Addison Phillips
> Globalization Architect -- Yahoo! Inc.
> Chair -- W3C Internationalization Core WG
> 
> Internationalization is an architecture.
> It is not a feature.
Received on Monday, 21 January 2008 18:52:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:16 GMT