W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Re: Language Identifier List up for comments

From: John Cowan <jcowan@reutershealth.com>
Date: Tue, 14 Dec 2004 23:59:13 -0500
To: Mark Davis <mark.davis@jtcsv.com>
Cc: Tex Texin <tex@xencraft.com>, Richard Ishida <ishida@w3.org>, www-international@w3.org, ietf-languages@alvestrand.no
Message-ID: <20041215045913.GA4175@skunk.reutershealth.com>

Mark Davis scripsit:

> I don't know what this list is intended for, nor how it would be used (or
> misused), nor precisely what it is supposed to measure, nor the criteria for
> being on or off the list. Do the authors thinkg that someone supposed to
> reject a language tag containing a region that is not on the list? Or that
> localizations be limited to the list? Or include all of the list?

No, no, and maybe respectively.  The idea is to construct a list of
plausible xx-yy language tags.  We know en-gb is plausible and nv-dk
is not, and obviously "plausible" is a fuzzy category.  I therefore
attempted to create a seed list which can be edited into a more useful one.

Which languages have national variants which can be usefully distinguished,
and what are those relevant national variants?

> B. a set of language->region mappings that include every region where there
> is a significant population base of native speakers or the language is an
> official language of that country. E.g. something like:
> 
> EN => AS AU BM BW BZ CA CM GB GH HK IE IN JM MT NG NZ PH PK PU RH SG TT UG
> UM US VG VI ZA ZW
> FR => BE CA CD CH CI FR FX LU MC PF RE
> ...

Well, that's the basis-list I tried to create, though with insufficient
data (inevitably).  But I see no point in retaining multiple regions for
any languages which are essentially uniform across those regions.
There's been a claim that Hungarian has this property.

> This list would be easier to derive, although the criteria for
> "significantly" would present its own challenges.

Since the list is not normative, it doesn't have to be perfect.

> By the way, it's missing (depending on how the criteria are applied):

Excellent.  My co-worker is probably sorry he undertook to maintain
this list.

> aa-DJ, aa-ER, aa-ET, af-ZA, am-ET, ar-IN, as-IN, az-AZ, be-BY, bg-BG,
> byn-ER, ca-ES, cs-CZ, dv-MV, dz-BT, en-BE, en-HK, en-IN, en-MH, en-UM,
> et-EE, eu-ES, fa-AF, fa-IR, fi-FI, fo-FO, gez-ER, gez-ET, gl-ES, gu-IN,
> haw-US, he-IL, hi-IN, hy-AM, id-ID, is-IS, ja-JP, ka-GE, kk-KZ, kl-GL,
> km-KH, kn-IN, kok-IN, ky-KG, lo-LA, lt-LT, lv-LV, mk-MK, ml-IN, mn-MN,
> mr-IN, mt-MT, nb-NO, nn-NO, om-ET, om-KE, or-IN, pa-IN, pl-PL, ps-AF, ro-RO,
> ru-RU, ru-UA, sa-IN, sh-YU, sid-ET, sk-SK, sl-SI, so-DJ, so-ET, so-KE,
> so-SO, sq-AL, sr-Cyrl, sr-Cyrl-YU, sr-Latn, sr-Latn-YU, syr-SY, te-IN,
> th-TH, ti-ER, ti-ET, tig-ER, tt-RU, uk-UA, uz-AF, uz-UZ, vi-VN, wal-ET,
> zh-HK, zh-Hans, zh-Hans-CN, zh-Hans-SG, zh-Hant, zh-Hant-HK, zh-Hant-MO,
> zh-Hant-TW, zh-MO

-- 
John Cowan  jcowan@reutershealth.com  www.reutershealth.com  www.ccil.org/~cowan
[R]eversing the apostolic precept to be all things to all men, I usually [before
Darwin] defended the tenability of the received doctrines, when I had to do
with the [evolution]ists; and stood up for the possibility of [evolution] among
the orthodox -- thereby, no doubt, increasing an already current, but quite
undeserved, reputation for needless combativeness.  --T. H. Huxley
Received on Wednesday, 15 December 2004 05:00:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT