- From: A. Vine <andrea.vine@Sun.COM>
- Date: Wed, 07 Apr 2004 10:35:09 -0700
- To: aphillips@webmethods.com
- Cc: www-international@w3.org
So, what would be the proper way to indicate Mandarin as written in Taiwan using Traditional Chinese characters? Addison Phillips [wM] wrote: > Hi Richard, > > This debate about where to position script tags took place some months ago on the ietf-languages list. You might want to review the archives of that list. > > Basically the subtags of a language tag modify the base language and not each other (insofar as that is possible). It is the case that they form an implied hierarchy, so 'zh-Hant-TW' is more specific than 'zh-Hant', although Rfc3066bis, like RFCs 3066 and 1766 before it, make clear that this hierarchy may not have intrinsic meaning (that the 'next step up' may not be mutually intelligible). > > So the basic question about where to put the script tag in the language tag revolved around whether script or region was more general as a classification. That is, are "texts written in Traditional Chinese" a superset of "texts written in/for Taiwan", or the other way around? > > Ultimately, language tags and their limited hierarchy are an approximation of language. They cannot possibly capture all of the nuance implied by something as organic as human language. But the question is whether they can capture the details closely enough to suit the vast majority of applications. > > In other words, the addition of script codes allows us to identify languages in the corner case where writing system variations enter into the equation, long a problem in identifying Chinese amoung other languages. It was clear early on that in order to have deterministic parsing, the position and length of each subtag must be fixed. Furthermore we didn't want to create a whole messy nimbus of tags with very very similar semantics ('zh-TW-Hant' vs. 'zh-Hant-TW') and no way to choose between them. > > RFC3066bis adds additional slots (two of them) to the existing ontology, but does so in the most conservative way possible. > > Best Regards, > > Addison > > Addison P. Phillips > Director, Globalization Architecture > webMethods | Delivering Global Business Visibility > http://www.webMethods.com > Chair, W3C Internationalization (I18N) Working Group > Chair, W3C-I18N-WG, Web Services Task Force > http://www.w3.org/International > > Internationalization is an architecture. > It is not a feature. > > >>-----Original Message----- >>From: www-international-request@w3.org >>[mailto:www-international-request@w3.org]On Behalf Of Richard Ishida >>Sent: mercredi 7 avril 2004 07:48 >>To: www-international@w3.org >>Subject: Traditional Chinese in RFC3066 bis >> >> >> >>Tex raised a question on IRC about how to represent Traditional >>Chinese in RFC3066 bis[1]: >> >> "I thought zh-TW would become zh-hant-TW, not just zh-hant" >> >>My understanding is that if you want to say just Traditional >>Chinese (without specifying which specific language you are >>representing) you would continue to use zh-Hant. This would >>probably apply for most web pages or translations. >> >>The meaning of zh-Hant-TW is not clear to me. Does it mean a >>Taiwanese version of the script (as opposed, say, to a Hong Kong >>version of Traditional Chinese which may include different >>characters); or does it mean the language of Chinese as spoken in >>Taiwan and written with Traditional Chinese characters? If so, >>how would that differ from zh-TW? >> >>I could imagine the latter scenario being represented more >>intuitively as zh-TW-Hant, although I think that is not legal >>according to RFC 3066 bis. >> >>Addison, Mark, help ! >> >>RI >> >> >>============ >>Richard Ishida >>W3C >> >>contact info: >>http://www.w3.org/People/Ishida/ >> >>W3C Internationalization: >>http://www.w3.org/International/ >> >> > > -- I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. -Bjarne Stroustrup, designer of C++ programming language (1950- )
Received on Wednesday, 7 April 2004 13:10:41 UTC