Re: Language Identifier List up for comments

From: Mark Davis <mark.davis@jtcsv.com> · Date: Wed, 15 Dec 2004 07:20:07 -0800

> if the experts don't agree on the codes to use,
> either because the codes are ambiguous or because the decision process is
so
> complex

This is really not the point. What does one call a dog? I could refer to it
at many levels of detail. Animalia: Chordata: Mammalia: Carnivora: Canidae:
Canis: Familiaris: down to breeds: Working: Herding: Sheepdog: German
Shepard... (And there is some disagreement on whether it should be Canis
familiaris or Canis lupes familiaris.) And I might want to distinguish a
gray German Shepard dog from one with some tan, whereas you might not. Which
is chosen depends on whether I want to be more or less specific, and make
distinctions that you may not want to make.

John indicated that the purpose of the list is "plausible" language tags. If
that is the critera, then without having to do extensive and fairly
difficult research, I'd say that it is each 639 code alone, then for each
tag add the combinations of scripts that are used with it. Then for each of
those tags that have significant speaker populations in different regions,
add the combinations.

Rather than have an unholy long list, this would be far easier both to use
and to maintain if composed of two tables:
language subtag => scripts in use
language {-script} tag => regions where used

‎Mark

----- Original Message ----- 
From: "Tex Texin" <tex@xencraft.com>
Cc: <www-international@w3.org>; <ietf-languages@alvestrand.no>
Sent: Wednesday, December 15, 2004 03:38
Subject: Re: Language Identifier List up for comments

> I have made some updates to the page.
> http://www.i18nguy.com/unicode/language-identifiers.html
>
> The mail volume and the fact that I get 3 copies of each, means it is
going to
> take me some time to sort thru.
> (Feel free to take my name off the mail, I am subscribed to both lists
where
> the thread appears.)
>
> My thanks to those of you that sent me private suggestions for language
codes
> that I can research as to whether they are different in different
countries,
> but I won't have time to do research. (Nor the appropriate skills.)
>
> I noted conflicting advice on cy and whether pategonian is different from
the
> version in the UK.
> For now both entries are in the table, and some of you can debate which is
> correct.
> For that matter it is not clear to me whether some of the en entries
aren't
> close enough to be the same.
>
> I began the table of one-level entries.
> At some point, every 639 entry should be in one or the other table.
>
> I am glad to see all the caveats being pointed out in the thread about
> dependencies on usage, context, and how significant a language difference
needs
> to be. To my feeble mind, if the experts don't agree on the codes to use,
> either because the codes are ambiguous or because the decision process is
so
> complex, then surely there is no hope for the majority of the community
that is
> responsible for choosing language tags. Which was my point.
>
> I still conclude that simple instructions that don't require decisions
based on
> information that is not generally available, is the more reliable model.
It is
> better for users and better for application developers.
>
> For the applications that linguists use, where the distinctions are much
more
> important, the current state of the art might be reasonable. (But I
wouldn't
> bet on it.)
>
> Cheers,
>
> Tex
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>