W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Re: Language Identifier List Criteria

From: Mark Davis <mark.davis@jtcsv.com>
Date: Mon, 20 Dec 2004 15:34:26 -0800
Message-ID: <058c01c4e6ec$71af4a00$6501a8c0@sanjose.ibm.com>
To: "Tex Texin" <tex@xencraft.com>, "Georg Schweizer" <gschweizer@gmx.at>
Cc: <www-international@w3.org>, <ietf-languages@alvestrand.no>

> However, RFC 3066's approach is generative. So de-AT is created by
combining
> codes from each of ISO 639 and ISO 3166, and neither defines what this
means.
> In fact RFC 3066 only defines the production and not which of the produced
> values are meaningful or what they mean except in the most general terms.

You keep expressing this in a counterproductive way. The language tag
'de-AT' is reasonably defined: German as used in Austria.

Suppose I have a protocol ID that distinguishes categories of people by
combining hair-color with nationality. Then "Samoan, blond" is perfectly
well defined. The fact that there are no existing examples does *NOT* mean
that it is "ambiguous", "not meaningful", or "not well-defined". And let's
suppose that all Danes were blond. Then "Dane, blond" would still be well
defined. The fact that it happens to have the same current denotation as
"Dane" does *NOT* mean that it is "ambiguous",  "not meaningful", or "not
well-defined".

To paraphrase Inigo Montoya, "I do not think those words mean what you think
they mean!"

Now, there are in fact edge cases; when is someone dishwater blond vs pale
brunette; what do you do with dual citizenship, etc. But it doesn't mean
that the protocol is senseless. And once you have a criterion of usage, you
can establish when two IDs have the same denotation or not, by doing the
research to see whether there are in fact non-blond Danes.

What you are really looking for is which language tags have the same
denotation, under some criterion of usage. And that criterion might be "does
someone need to provide different localizations (for non-speech enabled
applications)". That is *very* different from saying that the language tags
are "ambiguous",  "not meaningful", or "not well-defined".

‚ÄéMark

----- Original Message ----- 
From: "Tex Texin" <tex@xencraft.com>
To: "Georg Schweizer" <gschweizer@gmx.at>
Cc: <www-international@w3.org>; <ietf-languages@alvestrand.no>
Sent: Monday, December 20, 2004 13:26
Subject: Re: Language Identifier List Criteria


> Well, I will leave it to others to debate the characterization of the
standards
> as political, if they choose to.
> However, RFC 3066's approach is generative. So de-AT is created by
combining
> codes from each of ISO 639 and ISO 3166, and neither defines what this
means.
> In fact RFC 3066 only defines the production and not which of the produced
> values are meaningful or what they mean except in the most general terms.
> That's why we are having this discussion.
>
> Under RFC 3066 it is possible to create combinations of language and
region
> that have no useful value.
> So this need not be determined politically.
>
> For myself, I am looking for guidance for software and web producers.
Which
> labels to use when tagging content? When is there a enough of a difference
that
> bears paying for a translation?
>
> It is not clear to me it should be purely linguistic however.
> Politics is perhaps one element of the criteria.
> tex
>
>
> Georg Schweizer wrote:
> >
> > > Some languages are spoken in many countries, and the language is not
> > > distinctive in each country. I have started to accept suggestions as
> > > to which language-region codes do not represent a distinct language
> > > variation, and therefore are not recommended as tags, without good
> > > reason.
> > http://www.i18nguy.com/unicode/language-identifiers.html
> >
> > The criteria should be political rather than linguistic ones, as both
> > the ISO 639 language tags and the ISO 3166 country codes are based on
> > political agreement. Therefore I would not speak of "distict language
> > variations", but of distinct *official standards* (or at least distinct
> > conventions). Variations can be found everywhere (even within one
> > political region), whereas the same conventions can be followed by
> > several countries.
>
> -- 
> -------------------------------------------------------------
> Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> Xen Master                          http://www.i18nGuy.com
>
> XenCraft             http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
Received on Monday, 20 December 2004 23:35:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT