- From: Richard Ishida <ishida@w3.org>
- Date: Wed, 7 Apr 2004 19:14:41 +0100
- To: <aphillips@webmethods.com>, <www-international@w3.org>
> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
>
> This debate about where to position script tags took place
> some months ago on the ietf-languages list. You might want to
> review the archives of that list.
I'd much prefer to get a summary of the conclusions, for which many thanks ;-)
>
> Basically the subtags of a language tag modify the base
> language and not each other (insofar as that is possible). It
> is the case that they form an implied hierarchy, so
> 'zh-Hant-TW' is more specific than 'zh-Hant', although
> Rfc3066bis, like RFCs 3066 and 1766 before it, make clear
> that this hierarchy may not have intrinsic meaning (that the
> 'next step up' may not be mutually intelligible).
>
> So the basic question about where to put the script tag in
> the language tag revolved around whether script or region was
> more general as a classification. That is, are "texts written
> in Traditional Chinese" a superset of "texts written in/for
> Taiwan", or the other way around?
Hmm. I tend to see it as somewhat orthogonal, which is why in the format I proposed way back I separated it out as, for example, "zh-TW/Hant" - ie. "<lang+dialect>/<script>" - which btw would also allow you to say "/Hant" (ie. 'I know it uses the Traditional Chinese script, but I don't know what language') as well as match easily with existing zh-TW.
This also has the merit of not claiming that script is an aspect of language.
>
> Ultimately, language tags and their limited hierarchy are an
> approximation of language. They cannot possibly capture all
> of the nuance implied by something as organic as human
> language. But the question is whether they can capture the
> details closely enough to suit the vast majority of applications.
>
> In other words, the addition of script codes allows us to
> identify languages in the corner case where writing system
> variations enter into the equation, long a problem in
> identifying Chinese amoung other languages. It was clear
> early on that in order to have deterministic parsing, the
> position and length of each subtag must be fixed. Furthermore
> we didn't want to create a whole messy nimbus of tags with
> very very similar semantics ('zh-TW-Hant' vs. 'zh-Hant-TW')
> and no way to choose between them.
>
> RFC3066bis adds additional slots (two of them) to the
> existing ontology, but does so in the most conservative way possible.
>
> Best Regards,
>
> Addison
>
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods | Delivering Global Business Visibility
> http://www.webMethods.com Chair, W3C Internationalization
> (I18N) Working Group Chair, W3C-I18N-WG, Web Services Task
> Force http://www.w3.org/International
>
> Internationalization is an architecture.
> It is not a feature.
>
> > -----Original Message-----
> > From: www-international-request@w3.org
> > [mailto:www-international-request@w3.org]On Behalf Of Richard Ishida
> > Sent: mercredi 7 avril 2004 07:48
> > To: www-international@w3.org
> > Subject: Traditional Chinese in RFC3066 bis
> >
> >
> >
> > Tex raised a question on IRC about how to represent Traditional
> > Chinese in RFC3066 bis[1]:
> >
> > "I thought zh-TW would become zh-hant-TW, not just zh-hant"
> >
> > My understanding is that if you want to say just
> Traditional Chinese
> > (without specifying which specific language you are
> > representing) you would continue to use zh-Hant. This
> would probably
> > apply for most web pages or translations.
> >
> > The meaning of zh-Hant-TW is not clear to me. Does it mean a
> > Taiwanese version of the script (as opposed, say, to a Hong Kong
> > version of Traditional Chinese which may include different
> > characters); or does it mean the language of Chinese as spoken in
> > Taiwan and written with Traditional Chinese characters? If so, how
> > would that differ from zh-TW?
> >
> > I could imagine the latter scenario being represented more
> intuitively
> > as zh-TW-Hant, although I think that is not legal according to RFC
> > 3066 bis.
> >
> > Addison, Mark, help !
> >
> > RI
> >
> >
> > ============
> > Richard Ishida
> > W3C
> >
> > contact info:
> > http://www.w3.org/People/Ishida/
> >
> > W3C Internationalization:
> > http://www.w3.org/International/
> >
> >
>
Received on Wednesday, 7 April 2004 14:14:39 UTC