W3C home > Mailing lists > Public > www-international@w3.org > April to June 2004

RE: Traditional Chinese in RFC3066 bis

From: McDonald, Ira <imcdonald@sharplabs.com>
Date: Wed, 7 Apr 2004 10:01:18 -0700
Message-ID: <CFEE79A465B35C4385389BA5866BEDF00C7660@mailsrvnt02.enet.sharplabs.com>
To: "'aphillips@webmethods.com'" <aphillips@webmethods.com>, Richard Ishida <ishida@w3.org>, www-international@w3.org

Hi folks,

The 'RFC3066bis' that Addision refers to below is available at:

ftp://ftp.ietf.org/internet-drafts/draft-phillips-langtags-01.txt
(13 February 2004)

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Blue Roof Music / High North Inc
PO Box 221  Grand Marais, MI  49839
phone: +1-906-494-2434
email: imcdonald@sharplabs.com

-----Original Message-----
From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
Sent: Wednesday, April 07, 2004 12:38 PM
To: Richard Ishida; www-international@w3.org
Subject: RE: Traditional Chinese in RFC3066 bis



Hi Richard,

This debate about where to position script tags took place some months ago
on the ietf-languages list. You might want to review the archives of that
list.

Basically the subtags of a language tag modify the base language and not
each other (insofar as that is possible). It is the case that they form an
implied hierarchy, so 'zh-Hant-TW' is more specific than 'zh-Hant', although
Rfc3066bis, like RFCs 3066 and 1766 before it, make clear that this
hierarchy may not have intrinsic meaning (that the 'next step up' may not be
mutually intelligible).

So the basic question about where to put the script tag in the language tag
revolved around whether script or region was more general as a
classification. That is, are "texts written in Traditional Chinese" a
superset of "texts written in/for Taiwan", or the other way around?

Ultimately, language tags and their limited hierarchy are an approximation
of language. They cannot possibly capture all of the nuance implied by
something as organic as human language. But the question is whether they can
capture the details closely enough to suit the vast majority of
applications.

In other words, the addition of script codes allows us to identify languages
in the corner case where writing system variations enter into the equation,
long a problem in identifying Chinese amoung other languages. It was clear
early on that in order to have deterministic parsing, the position and
length of each subtag must be fixed. Furthermore we didn't want to create a
whole messy nimbus of tags with very very similar semantics ('zh-TW-Hant'
vs. 'zh-Hant-TW') and no way to choose between them. 

RFC3066bis adds additional slots (two of them) to the existing ontology, but
does so in the most conservative way possible.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: www-international-request@w3.org
> [mailto:www-international-request@w3.org]On Behalf Of Richard Ishida
> Sent: mercredi 7 avril 2004 07:48
> To: www-international@w3.org
> Subject: Traditional Chinese in RFC3066 bis
> 
> 
> 
> Tex raised a question on IRC about how to represent Traditional 
> Chinese in RFC3066 bis[1]: 
> 
>  	"I thought zh-TW would become zh-hant-TW, not just zh-hant"
> 
> My understanding is that if you want to say just Traditional 
> Chinese (without specifying which specific language you are 
> representing) you would continue to use zh-Hant.  This would 
> probably apply for most web pages or translations.
> 
> The meaning of zh-Hant-TW is not clear to me.  Does it mean a 
> Taiwanese version of the script (as opposed, say, to a Hong Kong 
> version of Traditional Chinese which may include different 
> characters); or does it mean the language of Chinese as spoken in 
> Taiwan and written with Traditional Chinese characters?  If so, 
> how would that differ from zh-TW?
> 
> I could imagine the latter scenario being represented more 
> intuitively as zh-TW-Hant, although I think that is not legal 
> according to RFC 3066 bis.
> 
> Addison, Mark, help !
> 
> RI
> 
> 
> ============
> Richard Ishida
> W3C
> 
> contact info:
> http://www.w3.org/People/Ishida/ 
> 
> W3C Internationalization:
> http://www.w3.org/International/ 
>  
> 
Received on Wednesday, 7 April 2004 13:01:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT