- From: Elizabeth J. Pyatt <ejp10@psu.edu>
- Date: Wed, 15 Dec 2004 16:53:16 -0500
- To: "A. Vine" <andrea.vine@sun.com>
- Cc: www-international@w3.org
I do see your point, but I'm not sure what language tag will help in your scenarios. >>I disagree on this point. There are Central Asian languages (e.g. >>Uzbek) which can be written in three scripts (Roman, Cyrillic, >>Arabic), yet they are not called different languages. > >You are misinterpreting my point. When a language is written, it >has a script (or writing system, some might prefer). What that >script _is_ is another matter. _That_ it is, is my point. Which is >one reason why "zh" alone is unhelpful for actual, practical >application. If I am a browser and I get a page that says it's >"zh", I don't know what to do. I don't know what to match it to, I >don't know what font to load, Currently, I believe that the encoding tag (e.g. "gb5") plus the actual characters tells the browsers what to display with or without a language tag. If you have a page tagged correctly for language BUT forget the encoding tag, you are in trouble generally speaking. But if you forget the language tag but include the encoding tag, usually you will get good results visually. > I don't know what voice synthesizer to load. I have to guess or >make assumptions or run some additional heuristics. And this is where I think the language tag is most valid. Because of the way the Chinese script (Simple/Traditional) is designed, it may be a bit "language blind" in some cases. A speaker in Hong Kong seeing the characters may read it in Cantonese (zh-, but a speaker in Beiging may read it in Mandarin Chinese. You could design a speech synthesizer which reads aloud either depending on user preferences. This is why I claim that you may be stuck without a specific language code. It's theoretically many languages in one script - which IS a major advantage of the script. However, it's hard to say what language it is phonetically. The world tends to assume it's zh-han because of political matters, but apparently there's a bit of fudge factor involved. >I have heard this, but I have also heard from some of our Chinese >l10n folks that there are >some differences in the way things would >be written in some dialects. In others words, it may >be understood >but it's not "native". But I leave this to the Chinese scholars. If there are genuine script differences, then I agree you would have to specify zh-han vs. the other zh's.. Maybe I am misunderstanding the situation. It would be nice if the Chinese scholars could tell the group. I definitely agree you would have to do it for the Roman forms. >What most software does right now is makes assumptions due to legacy >use of "zh" meaning "Simplified Chinese, Mandarin in the PRC". It >doesn't matter what we do from now on, as long as that legacy tag is >out there (and it is). That sounds about right. That's pretty much what the world assumes when you say "Chinese" Cheers Elizabeth > > >Andrea > >> >>Elizabeth Pyatt -- =-=-=-=-=-=-=-=-=-=-=-=-= Elizabeth J. Pyatt, Ph.D. Instructional Designer Education Technology Services, TLT/ITS Penn State University ejp10@psu.edu, (814) 865-0805 or (814) 865-2030 (Main Office) 210 Rider Building II 227 W. Beaver Avenue State College, PA 16801-4819 http://www.personal.psu.edu/ejp10/psu http://tlt.psu.edu
Received on Wednesday, 15 December 2004 21:57:50 UTC