Re: Language Identifier List up for comments

From: Tex Texin <tex@xencraft.com> · Date: Wed, 15 Dec 2004 11:20:58 -0800

Mark,
Whereas I agree that the intended use can vary and hence there can be
disagreement among the user community about how much specificity is needed, the
rules ought to be clear to the people that make the tags.

To my mind, if the detail is important to some, then tell the taggers to tag
the dogs with color distinctions.
If it is too difficult, then tell the taggers color is unimportant, and those
that need color distinction will have to get it another way. At least then the
rules of the road are clear.

However, I have asked the good folks at http://www.bowlingual-translator.com/
to tell me their policy and will report it back here.

Unfortunately, or fortunately, on the web I have no idea how a language tag
that I provide will be used. It may get served to a user with a voice reader,
or some very advanced rendering device where the nuance is valuable. Or it may
be important to a search engine or web service that is munging billions of
items and the detail makes a difference to the results.

Yes, the plausible list is a good starting point. Then we can see if that dog
hunts. ;-)

As for the format of the tables, yes, I have several ideas on how to improve
the layout for usability and to reduce the download time. I don't have the time
to reformat during this week.
tex

Mark Davis wrote:
> 
> > if the experts don't agree on the codes to use,
> > either because the codes are ambiguous or because the decision process is
> so
> > complex
> 
> This is really not the point. What does one call a dog? I could refer to it
> at many levels of detail. Animalia: Chordata: Mammalia: Carnivora: Canidae:
> Canis: Familiaris: down to breeds: Working: Herding: Sheepdog: German
> Shepard... (And there is some disagreement on whether it should be Canis
> familiaris or Canis lupes familiaris.) And I might want to distinguish a
> gray German Shepard dog from one with some tan, whereas you might not. Which
> is chosen depends on whether I want to be more or less specific, and make
> distinctions that you may not want to make.
> 
> John indicated that the purpose of the list is "plausible" language tags. If
> that is the critera, then without having to do extensive and fairly
> difficult research, I'd say that it is each 639 code alone, then for each
> tag add the combinations of scripts that are used with it. Then for each of
> those tags that have significant speaker populations in different regions,
> add the combinations.
> 
> Rather than have an unholy long list, this would be far easier both to use
> and to maintain if composed of two tables:
> language subtag => scripts in use
> language {-script} tag => regions where used
> 
> â€ŽMark
> 
> ----- Original Message -----
> From: "Tex Texin" <tex@xencraft.com>
> Cc: <www-international@w3.org>; <ietf-languages@alvestrand.no>
> Sent: Wednesday, December 15, 2004 03:38
> Subject: Re: Language Identifier List up for comments
> 
> > I have made some updates to the page.
> > http://www.i18nguy.com/unicode/language-identifiers.html
> >
> > The mail volume and the fact that I get 3 copies of each, means it is
> going to
> > take me some time to sort thru.
> > (Feel free to take my name off the mail, I am subscribed to both lists
> where
> > the thread appears.)
> >
> > My thanks to those of you that sent me private suggestions for language
> codes
> > that I can research as to whether they are different in different
> countries,
> > but I won't have time to do research. (Nor the appropriate skills.)
> >
> > I noted conflicting advice on cy and whether pategonian is different from
> the
> > version in the UK.
> > For now both entries are in the table, and some of you can debate which is
> > correct.
> > For that matter it is not clear to me whether some of the en entries
> aren't
> > close enough to be the same.
> >
> > I began the table of one-level entries.
> > At some point, every 639 entry should be in one or the other table.
> >
> > I am glad to see all the caveats being pointed out in the thread about
> > dependencies on usage, context, and how significant a language difference
> needs
> > to be. To my feeble mind, if the experts don't agree on the codes to use,
> > either because the codes are ambiguous or because the decision process is
> so
> > complex, then surely there is no hope for the majority of the community
> that is
> > responsible for choosing language tags. Which was my point.
> >
> > I still conclude that simple instructions that don't require decisions
> based on
> > information that is not generally available, is the more reliable model.
> It is
> > better for users and better for application developers.
> >
> > For the applications that linguists use, where the distinctions are much
> more
> > important, the current state of the art might be reasonable. (But I
> wouldn't
> > bet on it.)
> >
> > Cheers,
> >
> > Tex
> >
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages@alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
> >

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com

XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------