- From: A. Vine <andrea.vine@Sun.COM>
- Date: Wed, 15 Dec 2004 15:20:26 -0800
- To: www-international@w3.org
Sure, Mark, and how many software packages that people are using process this today? You and I know what the standards are and what their status is, but getting them to the people out there in a functional way is another animal. Mark Davis wrote: >>It's fine if it's there, but software interpretation of script subtags >>is a future concept, not a current one. > > > It is a present issue. There are already a number of language tags with > script subtags; look at the 3066 registry. > > Mark > > ----- Original Message ----- > From: "A. Vine" <andrea.vine@sun.com> > To: <www-international@w3.org> > Sent: Wednesday, December 15, 2004 11:55 > Subject: Re: Language Identifier List up for comments > > > >> >> >>Elizabeth J. Pyatt wrote: >> >> >>>A. Vine wrote >>> >>> >>>>>But now you are talking about differences in a script, not >>>>>differences in a language. >>>> >>>> >>>>Um, when you're talking about the written word, they are somewhat >>>>inseparable. >>> >>> >>>I disagree on this point. There are Central Asian languages (e.g. Uzbek) >>>which can be written in three scripts (Roman, Cyrillic, Arabic), yet >>>they are not called different languages. >> >>You are misinterpreting my point. When a language is written, it has a >>script (or writing system, some might prefer). What that script _is_ is >>another matter. _That_ it is, is my point. Which is one reason why >>"zh" alone is unhelpful for actual, practical application. If I am a >>browser and I get a page that says it's "zh", I don't know what to do. >>I don't know what to match it to, I don't know what font to load, I >>don't know what voice synthesizer to load. I have to guess or make >>assumptions or run some additional heuristics. >> >>What most software does right now is makes assumptions due to legacy use >>of "zh" meaning "Simplified Chinese, Mandarin in the PRC". It doesn't >>matter what we do from now on, as long as that legacy tag is out there >>(and it is). >> >> >>>I realize that there are cases >>>of similar spoken forms being labelled as different languages because >>>they are written in different scripts, but that is more a matter of >>>politics than of linguistics. >>> >>>I concede that the encoding tag is not enough to specify the script, but >>>I would consider script to be a third meta tag. (i.e. ISO-15924 - >>>http://www.unicode.org/iso15924/iso15924-codes.html) >> >>It's fine if it's there, but software interpretation of script subtags >>is a future concept, not a current one. >> >> >>>I see that using Chinese-TW is NOT recommended, and I am glad to see >>>that. I also see why "zh" would not be helpful in of itself as it is >>>currently defined. I was assuming a definition of "zh" as the written >>>form used in Chinese dialect communities, but that does not appear to be >>>the correct definition. It would not be Mandarin Chinese because it can >>>be read all over the country by speakers of the different dialects. >> >>I have heard this, but I have also heard from some of our Chinese l10n >>folks that there are some differences in the way things would be written >>in some dialects. In others words, it may be understood but it's not >>"native". But I leave this to the Chinese scholars. >> >> >>>It's almost like a data set of numeric text which could be read in >>>almost any language. >>> >>>1 2 3 >>>=uno,dos,tres? >>>=one,two,three? >>> >>>What kind of language tag would a set of numbers be? "Math"? Would it >>>have no tag and assume a user agent will use the default language >>>(whatever it is). I assume that a speech synthesizer agent would treat >>>Chinese characters as if it were Mandarin Chinese and pronounce it >>>accordingly, but you could build several agents that could read them in >>>the other forms (Hakka, Cantonese) >>> >>>I would argue that if you're speaking of pin yin Romanization, it might >>>be important to specify that it is the Mandarin form because now >>>phonetic form is represented. The Romanized form of Cantonese would be >>>different. >> >>Again, I leave it to the Chinese scholars. >> >>Andrea >> >> >>>Elizabeth Pyatt >>> >>> >> >> >
Received on Wednesday, 15 December 2004 23:15:33 UTC