W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Re: Language Identifier List up for comments

From: Elizabeth J. Pyatt <ejp10@psu.edu>
Date: Wed, 15 Dec 2004 09:08:29 -0500
Message-Id: <p06100501bde5f180a984@[128.118.8.31]>
To: "A. Vine" <andrea.vine@sun.com>
Cc: www-international@w3.org

A. Vine wrote
>
>>But now you are talking about differences in a script, not 
>>differences in a language.
>
>Um, when you're talking about the written word, they are somewhat inseparable.

I disagree on this point. There are Central Asian languages (e.g. 
Uzbek) which can be written in three scripts (Roman, Cyrillic, 
Arabic), yet they are not called different languages. I realize that 
there are cases of similar spoken forms being labelled as different 
languages because they are written in different scripts, but that is 
more a matter of politics than of linguistics.

I concede that the encoding tag is not enough to specify the script, 
but I would consider script to be a  third meta tag. (i.e. ISO-15924 
- http://www.unicode.org/iso15924/iso15924-codes.html)

I see that using Chinese-TW is NOT recommended, and I am glad to see 
that. I also see why "zh" would not be helpful in of itself as it is 
currently defined. I was assuming a definition of "zh" as the written 
form used in Chinese dialect communities, but that does not appear to 
be the correct definition. It would not be Mandarin Chinese because 
it can be read all over the country by speakers of the different 
dialects.

It's almost like a data set of numeric text which could be read in 
almost any language.

1 2 3
=uno,dos,tres?
=one,two,three?

What kind of language tag would a set of numbers be? "Math"? Would it 
have no tag and assume a user agent will use the default language 
(whatever it is). I assume that a speech synthesizer agent would 
treat Chinese characters as if it were Mandarin Chinese and pronounce 
it accordingly, but you could build several agents that could read 
them in the other forms (Hakka, Cantonese)

I would argue that if you're speaking of pin yin Romanization, it 
might be important to specify that it is the Mandarin form because 
now phonetic form is represented. The Romanized form of Cantonese 
would be different.

Elizabeth Pyatt


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=
Elizabeth J. Pyatt, Ph.D.
Instructional Designer
Education Technology Services, TLT/ITS
Penn State University
ejp10@psu.edu, (814) 865-0805 or (814) 865-2030 (Main Office)

210 Rider Building II
227 W. Beaver Avenue
State College, PA   16801-4819
http://www.personal.psu.edu/ejp10/psu
http://tlt.psu.edu
Received on Wednesday, 15 December 2004 14:29:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT