- From: Mark Davis <mark.davis@jtcsv.com>
- Date: Tue, 14 Dec 2004 12:48:01 -0800
- To: "Tex Texin" <tex@xencraft.com>, "Richard Ishida" <ishida@w3.org>
- Cc: <www-international@w3.org>, <ietf-languages@alvestrand.no>
I think Richard was making a very different point. In particular, I share his concerns as given in [4]. I don't know what this list is intended for, nor how it would be used (or misused), nor precisely what it is supposed to measure, nor the criteria for being on or off the list. Do the authors thinkg that someone supposed to reject a language tag containing a region that is not on the list? Or that localizations be limited to the list? Or include all of the list? Here's the way I see it. There ar a couple of possible different lists that would be interesting information. A. A set of language subtags for which there is no difference in written or spoken form based on region. This, however, would be rather difficult to determine. I suspect the only qualifying ones would be those that were essentially limited to a single region. Even for a language like Japanese you'd have to verify that the Japanese immigrant community in the US, Brazil, etc. spoke and wrote identically to their counterparts in Japan. (More useful would be the size of given sub-populations, either heads or by various economic measures.) B. a set of language->region mappings that include every region where there is a significant population base of native speakers or the language is an official language of that country. E.g. something like: EN => AS AU BM BW BZ CA CM GB GH HK IE IN JM MT NG NZ PH PK PU RH SG TT UG UM US VG VI ZA ZW FR => BE CA CD CH CI FR FX LU MC PF RE ... This list would be easier to derive, although the criteria for "significantly" would present its own challenges. As near as I can gather, http://www.i18nguy.com/unicode/language-identifiers.html is somehow a product of A and B. It would be the set of language tags where there was a difference between region subtags (A). It would either have to list all the language tags that could be composed from B, or if it didn't have all of the regions, it would have to indicate which of those regions were identified with the "regionless" language tag (and that decision itself is fraught with political issues). By the way, it's missing (depending on how the criteria are applied): aa-DJ, aa-ER, aa-ET, af-ZA, am-ET, ar-IN, as-IN, az-AZ, be-BY, bg-BG, byn-ER, ca-ES, cs-CZ, dv-MV, dz-BT, en-BE, en-HK, en-IN, en-MH, en-UM, et-EE, eu-ES, fa-AF, fa-IR, fi-FI, fo-FO, gez-ER, gez-ET, gl-ES, gu-IN, haw-US, he-IL, hi-IN, hy-AM, id-ID, is-IS, ja-JP, ka-GE, kk-KZ, kl-GL, km-KH, kn-IN, kok-IN, ky-KG, lo-LA, lt-LT, lv-LV, mk-MK, ml-IN, mn-MN, mr-IN, mt-MT, nb-NO, nn-NO, om-ET, om-KE, or-IN, pa-IN, pl-PL, ps-AF, ro-RO, ru-RU, ru-UA, sa-IN, sh-YU, sid-ET, sk-SK, sl-SI, so-DJ, so-ET, so-KE, so-SO, sq-AL, sr-Cyrl, sr-Cyrl-YU, sr-Latn, sr-Latn-YU, syr-SY, te-IN, th-TH, ti-ER, ti-ET, tig-ER, tt-RU, uk-UA, uz-AF, uz-UZ, vi-VN, wal-ET, zh-HK, zh-Hans, zh-Hans-CN, zh-Hans-SG, zh-Hant, zh-Hant-HK, zh-Hant-MO, zh-Hant-TW, zh-MO Mark ----- Original Message ----- From: "Tex Texin" <tex@xencraft.com> To: "Richard Ishida" <ishida@w3.org> Cc: <www-international@w3.org>; <ietf-languages@alvestrand.no> Sent: Tuesday, December 14, 2004 11:02 Subject: Re: Language Identifier List up for comments > I agree, and that's why we need to provide more guidance than we have done to > date. > > Richard Ishida wrote: > > > > Comments: > > > > [1] For Chinese: What about zh-Hans and zh-Hant? What about the IANA stuff > > like zh-hakka, etc.? > > > > [2] What if I just want to say "This is Turkish - but I don't know which > > dialect"? The list makes it seem like I *need* to choose one of the country > > variants. > > > > [3] Is there a big enough difference between en-GB and, say, en-FK that I > > should need to distinguish between the two? > > > > [4] I'm not clear about the value of the list. A list like this suggests to > > me that things can be looked up here without a great deal of thought. I'm > > not convinced that that is true. And once one applies a little thought > > about the most appropriate label to use, it is hardly difficult to come up > > with the appropriate country code. Perhaps there would be a minimal value > > in helping find some of the country codes you might need, but then I would > > organise the information slightly differently. > > > > [5] I think the choice of language code also depends on the intended usage. > > That is very hard to predict, of course. If one is simply applying a > > different font to English text embedded in an Arabic document, then I think > > labelling with subcodes is overkill. If labelling English text for use with > > a spell checker, a distinction between en-US and en-GB is typically useful > > because spell checkers for English tend to take that distinction into > > account - whether that applies for all variants of other languages is not > > clear to me. If dealing with a text to speech application that can > > distinguish accents such as en-UK-scouse, then a higher level of detail is > > needed than that given in the table. If dealing with Accept-Language > > declarations, then you must declare both en and en-UK/en-US in a browser, > > otherwise you won't always get the results you expected. I think the table > > over-simplifies the question. I'll concede that the answer to the question > > is very difficult to produce, but my concern is that the table seems to be > > offering a solution, by fiat, that is not always correct, and doesn't say > > that clearly enough. > > > > [6] typo: Lingala uses an upper case 'I' > > > > RI > > > > ============ > > Richard Ishida > > W3C > > > > contact info: > > http://www.w3.org/People/Ishida/ > > > > W3C Internationalization: > > http://www.w3.org/International/ > > > > Publication blog: > > http://people.w3.org/rishida/blog/ > > > > > > > > > -----Original Message----- > > > From: www-international-request@w3.org > > > [mailto:www-international-request@w3.org] On Behalf Of Tex Texin > > > Sent: 14 December 2004 10:43 > > > To: www-international@w3.org > > > Cc: www-international@w3.org; ietf-languages@alvestrand.no > > > Subject: Language Identifier List up for comments > > > > > > > > > http://www.i18nguy.com/unicode/language-identifiers.html > > > > > > I will add caveats and expand the list to be both one level > > > and two level as we go along. > > > > > > I am in a busy patch, so comment now, but I won't make many > > > updates until the weekend. > > > > > > tex > > > > > > > > -- > ------------------------------------------------------------- > Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com > Xen Master http://www.i18nGuy.com > > XenCraft http://www.XenCraft.com > Making e-Business Work Around the World > ------------------------------------------------------------- > > _______________________________________________ > Ietf-languages mailing list > Ietf-languages@alvestrand.no > http://www.alvestrand.no/mailman/listinfo/ietf-languages >
Received on Tuesday, 14 December 2004 20:48:12 UTC