W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Re: Language Identifier List Criteria

From: JFC (Jefsey) Morfin <jefsey@jefsey.com>
Date: Tue, 21 Dec 2004 22:20:38 +0900
Message-Id: <6.0.0.20.2.20041221222033.062e64e0@localhost>
To: www-international@w3.org




Gentlemen,
I am a small Registry Manager and I consider the different practical and 
low cost ways to implement a DNS ML.ML response to the real users 
demands/needs, with the limited solutions provided by the RFCs. I fully 
agree that my need is not application oriented, just DNS and network oriented.

1. in many countries the law forbids to make differences based upon racial, 
cultural, etc. criteria. The WTO rules call for commercial balance and 
reciprocity. These two principles means that sooner or later Civil Society 
groups will obtain in some countries that when a  ccTLD starts supporting 
one language, it has to support them all. This means that a ccTLD should 
support lingual "local TLDs" in every language and turn alt-root, se we 
have to find a solution.

Let me take an example to explain. The French corporation Renault can 
register renault.cn in China. This commercially give its site a plus that a 
Chinese car maker can object if he cannot register chinese-name.fr iun 
reprocity. The way I implemented our DNS management I have no problem with 
the several virtual zones this implies and the lingual registration site 
and support to be offered.

2. however the real demand is not foreign.ascii but ML.ML. This means 
Chinese user can use ML TLD, in Chinese. The CNNIC and the Chinese 
Government made this possible in using what I name a "ULD" (user level 
domain) we tested the various possibilities on a DNS test bed we ran 
(following the ICP-3 ICANN document). The ULD consists in using an SLD as a 
TLD (with its community, its image, the way it is entered). To support this 
target there are three ways:

a) on a language's "locale" terminal, a plug-in will correct the entered 
ML.ML into an acceptable ML.ML.TLD.
b) on cooperating ISPs or on the TLD servers the correction is carried via 
a nameserver front-end (in the future an OPES)
c) outside the language community and its keyboards, the full international 
version xn-name.xn-lang./Country.ccTLD is to be used.

The RFC 3066bis proposition offers the language/country (langtry?) codes. 
China has distributed tens of millions of plug-ins. All the major ISPs 
cooperate: most probably every Chinese user will use them by end 2005. This 
plug in works with any other ULD, so we have to live with this.


This creates a major problem because ULDs are the only way to support the 
language-country duality and the ML.ML demand, while being IDNA conformant. 
But in practical life ULDs are SLDs. So nothing prevents anyone to register 
an SLD which is used as an ULD in another TLD.

The proposition I have in mind is the following:

1. RFC 3066 uses ISO-639-2. This makes notation very confusing in DNS "-0z" 
character set. There is a significant overlap. There is none if ISO 639-3 
is used. So the language/country ULD identifier can be ISO-639-3.ISO 
3166-2. And can follow the Chinese practice of being the name in the 
country in the language. This way French in France will be identified in a 
table as "fra.fr" and possibly registered as "france.fr" and used as ".france".

I note that this kind of ULD will fall under the legal right of the French 
Gov to decide who should be its Registry Manager. So the issue will 
immediately fall under the responsibility of politicians. Therefore, IETF 
must either disregard this practical practical way to support ML.ML or 
propose solutions now, before politicians step in.

2. a table should be worked out, including all the significant 
ISO-639-3.ISO-3166-2 (i.e. a maximum of 1.500.000 entries) and their 
significant non conflicting reserved SLDs. This list should be notified as 
reserved (as in the past ccTLD.ccTLD was).

3. to support an access engine able to receive the default zone names ("*" 
entry in the DNS) to accept other criteria than language and country.

jfc morfin


At 00:34 21/12/2004, Mark Davis wrote:

> > However, RFC 3066's approach is generative. So de-AT is created by
>combining
> > codes from each of ISO 639 and ISO 3166, and neither defines what this
>means.
> > In fact RFC 3066 only defines the production and not which of the produced
> > values are meaningful or what they mean except in the most general terms.
>
>You keep expressing this in a counterproductive way. The language tag
>'de-AT' is reasonably defined: German as used in Austria.
>
>Suppose I have a protocol ID that distinguishes categories of people by
>combining hair-color with nationality. Then "Samoan, blond" is perfectly
>well defined. The fact that there are no existing examples does *NOT* mean
>that it is "ambiguous", "not meaningful", or "not well-defined". And let's
>suppose that all Danes were blond. Then "Dane, blond" would still be well
>defined. The fact that it happens to have the same current denotation as
>"Dane" does *NOT* mean that it is "ambiguous",  "not meaningful", or "not
>well-defined".
>
>To paraphrase Inigo Montoya, "I do not think those words mean what you think
>they mean!"
>
>Now, there are in fact edge cases; when is someone dishwater blond vs pale
>brunette; what do you do with dual citizenship, etc. But it doesn't mean
>that the protocol is senseless. And once you have a criterion of usage, you
>can establish when two IDs have the same denotation or not, by doing the
>research to see whether there are in fact non-blond Danes.
>
>What you are really looking for is which language tags have the same
>denotation, under some criterion of usage. And that criterion might be "does
>someone need to provide different localizations (for non-speech enabled
>applications)". That is *very* different from saying that the language tags
>are "ambiguous",  "not meaningful", or "not well-defined".
>
>窶皿ark
>
>----- Original Message -----
>From: "Tex Texin" <tex@xencraft.com>
>To: "Georg Schweizer" <gschweizer@gmx.at>
>Cc: <www-international@w3.org>; <ietf-languages@alvestrand.no>
>Sent: Monday, December 20, 2004 13:26
>Subject: Re: Language Identifier List Criteria
>
>
> > Well, I will leave it to others to debate the characterization of the
>standards
> > as political, if they choose to.
> > However, RFC 3066's approach is generative. So de-AT is created by
>combining
> > codes from each of ISO 639 and ISO 3166, and neither defines what this
>means.
> > In fact RFC 3066 only defines the production and not which of the produced
> > values are meaningful or what they mean except in the most general terms.
> > That's why we are having this discussion.
> >
> > Under RFC 3066 it is possible to create combinations of language and
>region
> > that have no useful value.
> > So this need not be determined politically.
> >
> > For myself, I am looking for guidance for software and web producers.
>Which
> > labels to use when tagging content? When is there a enough of a difference
>that
> > bears paying for a translation?
> >
> > It is not clear to me it should be purely linguistic however.
> > Politics is perhaps one element of the criteria.
> > tex
> >
> >
> > Georg Schweizer wrote:
> > >
> > > > Some languages are spoken in many countries, and the language is not
> > > > distinctive in each country. I have started to accept suggestions as
> > > > to which language-region codes do not represent a distinct language
> > > > variation, and therefore are not recommended as tags, without good
> > > > reason.
> > > http://www.i18nguy.com/unicode/language-identifiers.html
> > >
> > > The criteria should be political rather than linguistic ones, as both
> > > the ISO 639 language tags and the ISO 3166 country codes are based on
> > > political agreement. Therefore I would not speak of "distict language
> > > variations", but of distinct *official standards* (or at least distinct
> > > conventions). Variations can be found everywhere (even within one
> > > political region), whereas the same conventions can be followed by
> > > several countries.
> >
> > --
> > -------------------------------------------------------------
> > Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> > Xen Master                          http://www.i18nGuy.com
> >
> > XenCraft             http://www.XenCraft.com
> > Making e-Business Work Around the World
> > -------------------------------------------------------------
> >
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages@alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
> >
>
>_______________________________________________
>Ietf-languages mailing list
>Ietf-languages@alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/ietf-languages
Received on Tuesday, 21 December 2004 13:32:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT