W3C home > Mailing lists > Public > www-international@w3.org > April to June 2004

RE: Traditional Chinese in RFC3066 bis

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Wed, 7 Apr 2004 09:31:52 -0700
To: "Richard Ishida" <ishida@w3.org>, <www-international@w3.org>
Message-ID: <PNEHIBAMBMLHDMJDDFLHIEICHNAA.aphillips@webmethods.com>

Hi Richard,

This debate about where to position script tags took place some months ago on the ietf-languages list. You might want to review the archives of that list.

Basically the subtags of a language tag modify the base language and not each other (insofar as that is possible). It is the case that they form an implied hierarchy, so 'zh-Hant-TW' is more specific than 'zh-Hant', although Rfc3066bis, like RFCs 3066 and 1766 before it, make clear that this hierarchy may not have intrinsic meaning (that the 'next step up' may not be mutually intelligible).

So the basic question about where to put the script tag in the language tag revolved around whether script or region was more general as a classification. That is, are "texts written in Traditional Chinese" a superset of "texts written in/for Taiwan", or the other way around?

Ultimately, language tags and their limited hierarchy are an approximation of language. They cannot possibly capture all of the nuance implied by something as organic as human language. But the question is whether they can capture the details closely enough to suit the vast majority of applications.

In other words, the addition of script codes allows us to identify languages in the corner case where writing system variations enter into the equation, long a problem in identifying Chinese amoung other languages. It was clear early on that in order to have deterministic parsing, the position and length of each subtag must be fixed. Furthermore we didn't want to create a whole messy nimbus of tags with very very similar semantics ('zh-TW-Hant' vs. 'zh-Hant-TW') and no way to choose between them. 

RFC3066bis adds additional slots (two of them) to the existing ontology, but does so in the most conservative way possible.

Best Regards,


Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: www-international-request@w3.org
> [mailto:www-international-request@w3.org]On Behalf Of Richard Ishida
> Sent: mercredi 7 avril 2004 07:48
> To: www-international@w3.org
> Subject: Traditional Chinese in RFC3066 bis
> Tex raised a question on IRC about how to represent Traditional 
> Chinese in RFC3066 bis[1]: 
>  	"I thought zh-TW would become zh-hant-TW, not just zh-hant"
> My understanding is that if you want to say just Traditional 
> Chinese (without specifying which specific language you are 
> representing) you would continue to use zh-Hant.  This would 
> probably apply for most web pages or translations.
> The meaning of zh-Hant-TW is not clear to me.  Does it mean a 
> Taiwanese version of the script (as opposed, say, to a Hong Kong 
> version of Traditional Chinese which may include different 
> characters); or does it mean the language of Chinese as spoken in 
> Taiwan and written with Traditional Chinese characters?  If so, 
> how would that differ from zh-TW?
> I could imagine the latter scenario being represented more 
> intuitively as zh-TW-Hant, although I think that is not legal 
> according to RFC 3066 bis.
> Addison, Mark, help !
> RI
> ============
> Richard Ishida
> W3C
> contact info:
> http://www.w3.org/People/Ishida/ 
> W3C Internationalization:
> http://www.w3.org/International/ 
Received on Wednesday, 7 April 2004 12:37:30 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:23 UTC