W3C home > Mailing lists > Public > www-international@w3.org > April to June 2004

RE: Traditional Chinese in RFC3066 bis

From: Richard Ishida <ishida@w3.org>
Date: Wed, 7 Apr 2004 19:14:41 +0100
To: <aphillips@webmethods.com>, <www-international@w3.org>
Message-ID: <E1BBHZO-0006OZ-Qg@dr-nick.w3.org>

> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com] 
> 
> This debate about where to position script tags took place 
> some months ago on the ietf-languages list. You might want to 
> review the archives of that list.

I'd much prefer to get a summary of the conclusions, for which many thanks ;-)

> 
> Basically the subtags of a language tag modify the base 
> language and not each other (insofar as that is possible). It 
> is the case that they form an implied hierarchy, so 
> 'zh-Hant-TW' is more specific than 'zh-Hant', although 
> Rfc3066bis, like RFCs 3066 and 1766 before it, make clear 
> that this hierarchy may not have intrinsic meaning (that the 
> 'next step up' may not be mutually intelligible).
> 
> So the basic question about where to put the script tag in 
> the language tag revolved around whether script or region was 
> more general as a classification. That is, are "texts written 
> in Traditional Chinese" a superset of "texts written in/for 
> Taiwan", or the other way around?

Hmm.  I tend to see it as somewhat orthogonal, which is why in the format I proposed way back I separated it out as, for example, "zh-TW/Hant" - ie. "<lang+dialect>/<script>" - which btw would also allow you to say "/Hant" (ie. 'I know it uses the Traditional Chinese script, but I don't know what language') as well as match easily with existing zh-TW.

This also has the merit of not claiming that script is an aspect of language.

> 
> Ultimately, language tags and their limited hierarchy are an 
> approximation of language. They cannot possibly capture all 
> of the nuance implied by something as organic as human 
> language. But the question is whether they can capture the 
> details closely enough to suit the vast majority of applications.
> 
> In other words, the addition of script codes allows us to 
> identify languages in the corner case where writing system 
> variations enter into the equation, long a problem in 
> identifying Chinese amoung other languages. It was clear 
> early on that in order to have deterministic parsing, the 
> position and length of each subtag must be fixed. Furthermore 
> we didn't want to create a whole messy nimbus of tags with 
> very very similar semantics ('zh-TW-Hant' vs. 'zh-Hant-TW') 
> and no way to choose between them. 
> 
> RFC3066bis adds additional slots (two of them) to the 
> existing ontology, but does so in the most conservative way possible.
> 
> Best Regards,
> 
> Addison
> 
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods | Delivering Global Business Visibility 
> http://www.webMethods.com Chair, W3C Internationalization 
> (I18N) Working Group Chair, W3C-I18N-WG, Web Services Task 
> Force http://www.w3.org/International
> 
> Internationalization is an architecture. 
> It is not a feature.
> 
> > -----Original Message-----
> > From: www-international-request@w3.org 
> > [mailto:www-international-request@w3.org]On Behalf Of Richard Ishida
> > Sent: mercredi 7 avril 2004 07:48
> > To: www-international@w3.org
> > Subject: Traditional Chinese in RFC3066 bis
> > 
> > 
> > 
> > Tex raised a question on IRC about how to represent Traditional 
> > Chinese in RFC3066 bis[1]:
> > 
> >  	"I thought zh-TW would become zh-hant-TW, not just zh-hant"
> > 
> > My understanding is that if you want to say just 
> Traditional Chinese 
> > (without specifying which specific language you are
> > representing) you would continue to use zh-Hant.  This 
> would probably 
> > apply for most web pages or translations.
> > 
> > The meaning of zh-Hant-TW is not clear to me.  Does it mean a 
> > Taiwanese version of the script (as opposed, say, to a Hong Kong 
> > version of Traditional Chinese which may include different 
> > characters); or does it mean the language of Chinese as spoken in 
> > Taiwan and written with Traditional Chinese characters?  If so, how 
> > would that differ from zh-TW?
> > 
> > I could imagine the latter scenario being represented more 
> intuitively 
> > as zh-TW-Hant, although I think that is not legal according to RFC 
> > 3066 bis.
> > 
> > Addison, Mark, help !
> > 
> > RI
> > 
> > 
> > ============
> > Richard Ishida
> > W3C
> > 
> > contact info:
> > http://www.w3.org/People/Ishida/
> > 
> > W3C Internationalization:
> > http://www.w3.org/International/
> >  
> > 
> 
Received on Wednesday, 7 April 2004 14:14:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT