W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Re: Language Identifier List up for comments

From: Mark Davis <mark.davis@jtcsv.com>
Date: Fri, 17 Dec 2004 08:09:05 -0800
Message-ID: <009701c4e452$bd5d0e20$6501a8c0@sanjose.ibm.com>
To: "John Cowan" <jcowan@reutershealth.com>, "Tex Texin" <tex@xencraft.com>
Cc: <www-international@w3.org>, <ietf-languages@alvestrand.no>

I agree that there has been some useful dialog about this topic; always
helps to center it when people are faced with a real list. The language on
the page is still extremely misleading, however. Here are my
recommendations:

First, best to always use region instead of 'country'. Many of the regions
are not countries, and some people get miffed about it.

Language identifiers as specified by RFC 3066, can have the form language,
language-country, language-country-variant and some other specialized forms.
The guidelines for choosing between language and language-country are
ambiguous.

[The guidelines are clear; what is not clear is when there is a physical
difference. Talk of "ambiguity" is very misleading. The tags aren't
ambiguous; the most you can say is that the languages that they denote are
not materially different, for some definition of "materially". Moreover,
"language identifier" is used 'ambiguously' -- you have language identifier
mean both a language tag, and a language tag fromed from one lang subtag.]
=>
Language identifiers (tags) as specified by RFC 3066, can have the form
lang, lang-region, and some other specialized forms, where lang and region
are subtags using ISO codes. (There is a
[http://www.inter-locale.com/ID/draft-phillips-langtags-08.html proposed
successor] to RFC 3066 that extends this further.) However the RFC does not
identify which lang-region identfiers do not distinguish a written form that
is, for most localization purposes, materially different from that
distinguished by the corresponding lang identifiers.


This table lists the languages which have no other significant variations,
and therefore can be adequately represented by a language subtag alone, as
opposed to a language subtag and country subtag. In this table, where the
identifiers show the country tag, but it can be removed without causing
ambiguity.

[This is way too definitive, and the last sentence is just plain wrong: the
list owners can never say that there are no significant variations.
Moreover, it "the languages" makes it out to be a complete list, which it
will never be. Even if the data were true and well-known, a complete list
would be *anything* starting with af, am, as, ...
=>
This table lists some lang-region identifiers which, for most localization
purposes, do not need to have the region subtag included.

Some languages are spoken in many countries, and the language is not
distinctive in each country. I have started to accept suggestions as to
which language-region codes do not represent a distinct language variation,
and therefore are not recommended as tags, without good reason.

[looks like old, redundant text. nuke.]

The tags which are not recommended will look like this sentence.

[Whoa - not recommended? The one example given is way off the mark!

de‑AT, de‑CH, de‑DE, (de‑BE, de‑DK, de‑LI, de‑LU)

I use (...) for the 'not recommended' since the color distinction will not
show here.

de‑LI absolutely has a meaning. de-LI is certainly as different from de-DE
as de-CH is! The recommendation by the text that de-LI should just be
replaced by "de" is *way* off. The most you could do is say, for example,
the following:

de‑AT, de‑CH (de‑LI), de‑DE (de‑BE, de‑DK, de‑LU)

and say that the identifiers in (...) are ones that do not materially differ
in denotation from the one listed before them, for most localization
purposes. Even that is pretty dicy.]

Table...

en‑AG, en‑AI, en‑AS, en‑AU, en‑IN, en‑BB, en‑BE, en‑BM, en‑BN, en‑BS, en‑BW,
en‑BZ, en‑CA, en‑CK, en‑CM, en‑DM, en‑ER, en‑ET, en‑FJ, en‑FK, en‑FM, en‑GB,
en‑GD, en‑GH, en‑GI, en‑GM, en‑GU, en‑GY, en‑HK, en‑IE, en‑IL, en‑IO, en‑JM,
en‑KE, en‑KI, en‑KN, en‑KY, en‑LC, en‑LR, en‑LS, en‑MH, en‑MP, en‑MS, en‑MT,
en‑MU, en‑MW, en‑NA, en‑NF, en‑NG, en‑NR, en‑NU, en‑NZ, en‑PG, en‑PH, en‑PK,
en‑PN, en‑PR, en‑PW, en‑RW, en‑SB, en‑SC, en‑SG, en‑SH, en‑SL, en‑SO, en‑SZ,
en‑TC, en‑TK, en‑TO, en‑TT, en‑UG, en‑UM, en‑US, en‑VC, en‑VG, en‑VI, en‑VU,
en‑WS, en‑ZA, en‑ZM, en‑ZW

English

If you want feedback on the table from those who have not memorized country
codes, and to make it more comprehensible to people, I suggest you include a
more descriptive name. Even better would be to have an alternate table or
column, but that might be more maintanence for you. I'd also suggest having
the language on the left.

Included descriptive name
en (English)

en-AG (Antigua and Barbuda), en-AI (Anguilla), en-AS (American Samoa), en-AU
(Australia), en-IN (India), en-BB (Barbados), en-BE (Belgium), en-BM
(Bermuda), en-BN (Brunei), en-BS (Bahamas), en-BW (Botswana), en-BZ
(Belize), en-CA (Canada), en-CK (Cook Islands), en-CM (Cameroon), en-DM
(Dominica), en-ER (Eritrea), en-ET (Ethiopia), en-FJ (Fiji), en-FK (Falkland
Islands), en-FM (Micronesia), en-GB (United Kingdom), en-GD (Grenada), en-GH
(Ghana), en-GI (Gibraltar), en-GM (Gambia), en-GU (Guam), en-GY (Guyana),
en-HK (Hong Kong S.A.R., China), en-IE (Ireland), en-IL (Israel), en-IO
(British Indian Ocean Territory), en-JM (Jamaica), en-KE (Kenya), en-KI
(Kiribati), en-KN (Saint Kitts and Nevis), en-KY (Cayman Islands), en-LC
(Saint Lucia), en-LR (Liberia), en-LS (Lesotho), en-MH (Marshall Islands),
en-MP (Northern Mariana Islands), en-MS (Montserrat), en-MT (Malta), en-MU
(Mauritius), en-MW (Malawi), en-NA (Namibia), en-NF (Norfolk Island), en-NG
(Nigeria), en-NR (Nauru), en-NU (Niue), en-NZ (New Zealand), en-PG (Papua
New Guinea), en-PH (Philippines), en-PK (Pakistan), en-PN (Pitcairn), en-PR
(Puerto Rico), en-PW (Palau), en-RW (Rwanda), en-SB (Solomon Islands), en-SC
(Seychelles), en-SG (Singapore), en-SH (Saint Helena), en-SL (Sierra Leone),
en-SO (Somalia), en-SZ (Swaziland), en-TC (Turks and Caicos Islands), en-TK
(Tokelau), en-TO (Tonga), en-TT (Trinidad and Tobago), en-UG (Uganda), en-UM
(United States Minor Outlying Islands), en-US (United States), en-VC (Saint
Vincent and the Grenadines), en-VG (British Virgin Islands), en-VI (U.S.
Virgin Islands), en-VU (Vanuatu), en-WS (Samoa), en-ZA (South Africa), en-ZM
(Zambia), en-ZW (Zimbabwe)

Alternate Table/Column

English
Antigua and Barbuda, Anguilla, American Samoa, Australia, India, Barbados,
Belgium, Bermuda, Brunei, Bahamas, Botswana, Belize, Canada, Cook Islands,
Cameroon, Dominica, Eritrea, Ethiopia, Fiji, Falkland Islands, Micronesia,
United Kingdom, Grenada, Ghana, Gibraltar, Gambia, Guam, Guyana, Hong Kong
S.A.R., China, Ireland, Israel, British Indian Ocean Territory, Jamaica,
Kenya, Kiribati, Saint Kitts and Nevis, Cayman Islands, Saint Lucia,
Liberia, Lesotho, Marshall Islands, Northern Mariana Islands, Montserrat,
Malta, Mauritius, Malawi, Namibia, Norfolk Island, Nigeria, Nauru, Niue, New
Zealand, Papua New Guinea, Philippines, Pakistan, Pitcairn, Puerto Rico,
Palau, Rwanda, Solomon Islands, Seychelles, Singapore, Saint Helena, Sierra
Leone, Somalia, Swaziland, Turks and Caicos Islands, Tokelau, Tonga,
Trinidad and Tobago, Uganda, United States Minor Outlying Islands, United
States, Saint Vincent and the Grenadines, British Virgin Islands, U.S.
Virgin Islands, Vanuatu, Samoa, South Africa, Zambia, Zimbabwe



And given such a list, some items stand out. It is unclear why you should
have variants for English as in China or Israel, but not English as in
Russia or Egypt, for example.


‎Mark

----- Original Message ----- 
From: "John Cowan" <jcowan@reutershealth.com>
To: "Martin Duerst" <duerst@w3.org>
Cc: "Tex Texin" <tex@xencraft.com>; <www-international@w3.org>;
<ietf-languages@alvestrand.no>
Sent: Friday, December 17, 2004 04:55
Subject: Re: Language Identifier List up for comments


> Martin Duerst scripsit:
>
> > - I think there has been enough cross-posting. I suggest we all
> >   limit further posts to ietf-languages@alvestrand.no.
> >   Please direct followups only to that list.
>
> If anything, I think the interest and expertise exist mainly on
> www-international.  From the point of view of ietf-languags, these tags
> are all valid, period; "best practice" is not as central a concern
> there.  (I know this because my attempts to get the list reviewed by
> ietf-languages have always gone nowhere, whereas this attempt is getting
> lots of review.)
>
> > - "Proposed List of 1-level Language Identifiers": Why on earth
> >   are two-level codes given when it says that one-level codes
> >   are the right thing to use? Please, please, don't confuse
> >   the readers with such stuff, and remove the country codes
> >   from the identifiers as quickly as possible.
>
> I agree completely.  In addition, I think the entire third list should
> be migrated to the first list.  These are simply the codes for which
> regional variation on the national level is *not known* to exist (as
> opposed to codes for which r.v. on the n.l. is *known not* to exist).
>
> In pursuit of that, the introductions to the two lists should be changed
> from "languages which have no other significant variations" to "languages
> which are not known to vary significantly in different countries", and
> likewise "languages which differ by region" should be "languages which
> vary significantly in different countries".
>
> -- 
> Go, and never darken my towels again!           John Cowan
>         --Rufus T. Firefly                      www.ccil.org/~cowan
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
Received on Friday, 17 December 2004 16:09:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT