Re: Language Identifier List up for comments from A. Vine on 2004-12-15 (www-international@w3.org from October to December 2004)

From: A. Vine <andrea.vine@Sun.COM>
Date: Wed, 15 Dec 2004 15:20:26 -0800
To: www-international@w3.org
Message-id: <41C0C6BA.7030403@sun.com>
Sure, Mark, and how many software packages that people are using process 
this today?

You and I know what the standards are and what their status is, but 
getting them to the people out there in a functional way is another animal.

Mark Davis wrote:

>>It's fine if it's there, but software interpretation of script subtags
>>is a future concept, not a current one.
> 
> 
> It is a present issue. There are already a number of language tags with
> script subtags; look at the 3066 registry.
> 
> ‎Mark
> 
> ----- Original Message ----- 
> From: "A. Vine" <andrea.vine@sun.com>
> To: <www-international@w3.org>
> Sent: Wednesday, December 15, 2004 11:55
> Subject: Re: Language Identifier List up for comments
> 
> 
> 
>>
>>
>>Elizabeth J. Pyatt wrote:
>>
>>
>>>A. Vine wrote
>>>
>>>
>>>>>But now you are talking about differences in a script, not
>>>>>differences in a language.
>>>>
>>>>
>>>>Um, when you're talking about the written word, they are somewhat
>>>>inseparable.
>>>
>>>
>>>I disagree on this point. There are Central Asian languages (e.g. Uzbek)
>>>which can be written in three scripts (Roman, Cyrillic, Arabic), yet
>>>they are not called different languages.
>>
>>You are misinterpreting my point.  When a language is written, it has a
>>script (or writing system, some might prefer).  What that script _is_ is
>>another matter.  _That_ it is, is my point.  Which is one reason why
>>"zh" alone is unhelpful for actual, practical application.  If I am a
>>browser and I get a page that says it's "zh", I don't know what to do.
>>I don't know what to match it to, I don't know what font to load, I
>>don't know what voice synthesizer to load.  I have to guess or make
>>assumptions or run some additional heuristics.
>>
>>What most software does right now is makes assumptions due to legacy use
>>of "zh" meaning "Simplified Chinese, Mandarin in the PRC".  It doesn't
>>matter what we do from now on, as long as that legacy tag is out there
>>(and it is).
>>
>>
>>>I realize that there are cases
>>>of similar spoken forms being labelled as different languages because
>>>they are written in different scripts, but that is more a matter of
>>>politics than of linguistics.
>>>
>>>I concede that the encoding tag is not enough to specify the script, but
>>>I would consider script to be a  third meta tag. (i.e. ISO-15924 -
>>>http://www.unicode.org/iso15924/iso15924-codes.html)
>>
>>It's fine if it's there, but software interpretation of script subtags
>>is a future concept, not a current one.
>>
>>
>>>I see that using Chinese-TW is NOT recommended, and I am glad to see
>>>that. I also see why "zh" would not be helpful in of itself as it is
>>>currently defined. I was assuming a definition of "zh" as the written
>>>form used in Chinese dialect communities, but that does not appear to be
>>>the correct definition. It would not be Mandarin Chinese because it can
>>>be read all over the country by speakers of the different dialects.
>>
>>I have heard this, but I have also heard from some of our Chinese l10n
>>folks that there are some differences in the way things would be written
>>in some dialects.  In others words, it may be understood but it's not
>>"native".  But I leave this to the Chinese scholars.
>>
>>
>>>It's almost like a data set of numeric text which could be read in
>>>almost any language.
>>>
>>>1 2 3
>>>=uno,dos,tres?
>>>=one,two,three?
>>>
>>>What kind of language tag would a set of numbers be? "Math"? Would it
>>>have no tag and assume a user agent will use the default language
>>>(whatever it is). I assume that a speech synthesizer agent would treat
>>>Chinese characters as if it were Mandarin Chinese and pronounce it
>>>accordingly, but you could build several agents that could read them in
>>>the other forms (Hakka, Cantonese)
>>>
>>>I would argue that if you're speaking of pin yin Romanization, it might
>>>be important to specify that it is the Mandarin form because now
>>>phonetic form is represented. The Romanized form of Cantonese would be
>>>different.
>>
>>Again, I leave it to the Chinese scholars.
>>
>>Andrea
>>
>>
>>>Elizabeth Pyatt
>>>
>>>
>>
>>
>
Received on Wednesday, 15 December 2004 23:15:33 UTC