RE: Language Identifier List up for comments from Richard Ishida on 2004-12-14 (www-international@w3.org from October to December 2004)

From: Richard Ishida <ishida@w3.org>
Date: Tue, 14 Dec 2004 12:59:12 -0000
To: "'Tex Texin'" <tex@xencraft.com>, <www-international@w3.org>
Cc: <www-international@w3.org>, <ietf-languages@alvestrand.no>
Message-Id: <20041214125907.0158C4F43A@homer.w3.org>

Comments:

[1] For Chinese: What about zh-Hans and zh-Hant?  What about the IANA stuff
like zh-hakka, etc.?

[2] What if I just want to say "This is Turkish - but I don't know which
dialect"?  The list makes it seem like I *need* to choose one of the country
variants.

[3] Is there a big enough difference between en-GB and, say, en-FK that I
should need to distinguish between the two?

[4] I'm not clear about the value of the list.  A list like this suggests to
me that things can be looked up here without a great deal of thought.  I'm
not convinced that that is true.  And once one applies a little thought
about the most appropriate label to use, it is hardly difficult to come up
with the appropriate country code.  Perhaps there would be a minimal value
in helping find some of the country codes you might need, but then I would
organise the information slightly differently.

[5] I think the choice of language code also depends on the intended usage.
That is very hard to predict, of course.  If one is simply applying a
different font to English text embedded in an Arabic document, then I think
labelling with subcodes is overkill.  If labelling English text for use with
a spell checker, a distinction between en-US and en-GB is typically useful
because spell checkers for English tend to take that distinction into
account - whether that applies for all variants of other languages is not
clear to me.  If dealing with a text to speech application that can
distinguish accents such as en-UK-scouse, then a higher level of detail is
needed than that given in the table. If dealing with Accept-Language
declarations, then you must declare both en and en-UK/en-US in a browser,
otherwise you won't always get the results you expected. I think the table
over-simplifies the question.  I'll concede that the answer to the question
is very difficult to produce, but my concern is that the table seems to be
offering a solution, by fiat, that is not always correct, and doesn't say
that clearly enough.


[6] typo: Lingala uses an upper case 'I'


RI

============
Richard Ishida
W3C

contact info:
http://www.w3.org/People/Ishida/ 

W3C Internationalization:
http://www.w3.org/International/ 

Publication blog:
http://people.w3.org/rishida/blog/
 
 

> -----Original Message-----
> From: www-international-request@w3.org 
> [mailto:www-international-request@w3.org] On Behalf Of Tex Texin
> Sent: 14 December 2004 10:43
> To: www-international@w3.org
> Cc: www-international@w3.org; ietf-languages@alvestrand.no
> Subject: Language Identifier List up for comments
> 
> 
> http://www.i18nguy.com/unicode/language-identifiers.html
> 
> I will add caveats and expand the list to be both one level 
> and two level as we go along.
> 
> I am in a busy patch, so comment now, but I won't make many 
> updates until the weekend.
> 
> tex
> 
>

Received on Tuesday, 14 December 2004 12:59:13 UTC