Re: Language Identifier List up for comments

> It's fine if it's there, but software interpretation of script subtags
> is a future concept, not a current one.

It is a present issue. There are already a number of language tags with
script subtags; look at the 3066 registry.

‎Mark

----- Original Message ----- 
From: "A. Vine" <andrea.vine@sun.com>
To: <www-international@w3.org>
Sent: Wednesday, December 15, 2004 11:55
Subject: Re: Language Identifier List up for comments


>
>
>
> Elizabeth J. Pyatt wrote:
>
> > A. Vine wrote
> >
> >>
> >>> But now you are talking about differences in a script, not
> >>> differences in a language.
> >>
> >>
> >> Um, when you're talking about the written word, they are somewhat
> >> inseparable.
> >
> >
> > I disagree on this point. There are Central Asian languages (e.g. Uzbek)
> > which can be written in three scripts (Roman, Cyrillic, Arabic), yet
> > they are not called different languages.
>
> You are misinterpreting my point.  When a language is written, it has a
> script (or writing system, some might prefer).  What that script _is_ is
> another matter.  _That_ it is, is my point.  Which is one reason why
> "zh" alone is unhelpful for actual, practical application.  If I am a
> browser and I get a page that says it's "zh", I don't know what to do.
> I don't know what to match it to, I don't know what font to load, I
> don't know what voice synthesizer to load.  I have to guess or make
> assumptions or run some additional heuristics.
>
> What most software does right now is makes assumptions due to legacy use
> of "zh" meaning "Simplified Chinese, Mandarin in the PRC".  It doesn't
> matter what we do from now on, as long as that legacy tag is out there
> (and it is).
>
> > I realize that there are cases
> > of similar spoken forms being labelled as different languages because
> > they are written in different scripts, but that is more a matter of
> > politics than of linguistics.
> >
> > I concede that the encoding tag is not enough to specify the script, but
> > I would consider script to be a  third meta tag. (i.e. ISO-15924 -
> > http://www.unicode.org/iso15924/iso15924-codes.html)
>
> It's fine if it's there, but software interpretation of script subtags
> is a future concept, not a current one.
>
> >
> > I see that using Chinese-TW is NOT recommended, and I am glad to see
> > that. I also see why "zh" would not be helpful in of itself as it is
> > currently defined. I was assuming a definition of "zh" as the written
> > form used in Chinese dialect communities, but that does not appear to be
> > the correct definition. It would not be Mandarin Chinese because it can
> > be read all over the country by speakers of the different dialects.
>
> I have heard this, but I have also heard from some of our Chinese l10n
> folks that there are some differences in the way things would be written
> in some dialects.  In others words, it may be understood but it's not
> "native".  But I leave this to the Chinese scholars.
>
> >
> > It's almost like a data set of numeric text which could be read in
> > almost any language.
> >
> > 1 2 3
> > =uno,dos,tres?
> > =one,two,three?
> >
> > What kind of language tag would a set of numbers be? "Math"? Would it
> > have no tag and assume a user agent will use the default language
> > (whatever it is). I assume that a speech synthesizer agent would treat
> > Chinese characters as if it were Mandarin Chinese and pronounce it
> > accordingly, but you could build several agents that could read them in
> > the other forms (Hakka, Cantonese)
> >
> > I would argue that if you're speaking of pin yin Romanization, it might
> > be important to specify that it is the Mandarin form because now
> > phonetic form is represented. The Romanized form of Cantonese would be
> > different.
>
> Again, I leave it to the Chinese scholars.
>
> Andrea
>
> >
> > Elizabeth Pyatt
> >
> >
>
>

Received on Wednesday, 15 December 2004 22:02:45 UTC