- From: Mark Davis <mark.davis@jtcsv.com>
- Date: Wed, 15 Dec 2004 14:02:37 -0800
- To: <andrea.vine@sun.com>, <www-international@w3.org>
> It's fine if it's there, but software interpretation of script subtags > is a future concept, not a current one. It is a present issue. There are already a number of language tags with script subtags; look at the 3066 registry. Mark ----- Original Message ----- From: "A. Vine" <andrea.vine@sun.com> To: <www-international@w3.org> Sent: Wednesday, December 15, 2004 11:55 Subject: Re: Language Identifier List up for comments > > > > Elizabeth J. Pyatt wrote: > > > A. Vine wrote > > > >> > >>> But now you are talking about differences in a script, not > >>> differences in a language. > >> > >> > >> Um, when you're talking about the written word, they are somewhat > >> inseparable. > > > > > > I disagree on this point. There are Central Asian languages (e.g. Uzbek) > > which can be written in three scripts (Roman, Cyrillic, Arabic), yet > > they are not called different languages. > > You are misinterpreting my point. When a language is written, it has a > script (or writing system, some might prefer). What that script _is_ is > another matter. _That_ it is, is my point. Which is one reason why > "zh" alone is unhelpful for actual, practical application. If I am a > browser and I get a page that says it's "zh", I don't know what to do. > I don't know what to match it to, I don't know what font to load, I > don't know what voice synthesizer to load. I have to guess or make > assumptions or run some additional heuristics. > > What most software does right now is makes assumptions due to legacy use > of "zh" meaning "Simplified Chinese, Mandarin in the PRC". It doesn't > matter what we do from now on, as long as that legacy tag is out there > (and it is). > > > I realize that there are cases > > of similar spoken forms being labelled as different languages because > > they are written in different scripts, but that is more a matter of > > politics than of linguistics. > > > > I concede that the encoding tag is not enough to specify the script, but > > I would consider script to be a third meta tag. (i.e. ISO-15924 - > > http://www.unicode.org/iso15924/iso15924-codes.html) > > It's fine if it's there, but software interpretation of script subtags > is a future concept, not a current one. > > > > > I see that using Chinese-TW is NOT recommended, and I am glad to see > > that. I also see why "zh" would not be helpful in of itself as it is > > currently defined. I was assuming a definition of "zh" as the written > > form used in Chinese dialect communities, but that does not appear to be > > the correct definition. It would not be Mandarin Chinese because it can > > be read all over the country by speakers of the different dialects. > > I have heard this, but I have also heard from some of our Chinese l10n > folks that there are some differences in the way things would be written > in some dialects. In others words, it may be understood but it's not > "native". But I leave this to the Chinese scholars. > > > > > It's almost like a data set of numeric text which could be read in > > almost any language. > > > > 1 2 3 > > =uno,dos,tres? > > =one,two,three? > > > > What kind of language tag would a set of numbers be? "Math"? Would it > > have no tag and assume a user agent will use the default language > > (whatever it is). I assume that a speech synthesizer agent would treat > > Chinese characters as if it were Mandarin Chinese and pronounce it > > accordingly, but you could build several agents that could read them in > > the other forms (Hakka, Cantonese) > > > > I would argue that if you're speaking of pin yin Romanization, it might > > be important to specify that it is the Mandarin form because now > > phonetic form is represented. The Romanized form of Cantonese would be > > different. > > Again, I leave it to the Chinese scholars. > > Andrea > > > > > Elizabeth Pyatt > > > > > >
Received on Wednesday, 15 December 2004 22:02:45 UTC