W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

Re: Language Identifier List Comments, updated

From: JFC (Jefsey) Morfin <jefsey@jefsey.com>
Date: Fri, 31 Dec 2004 15:21:19 +0900
Message-Id: <>
To: www-international@w3.org

Dear Martin,
I suppose we are in full agreement. I copied www-international@w3.org only 
because it was copied (not in TO:). This has shown that W3C and Internet 
structural concerns are not the same and wanting to address them within the 
same W3C oriented tagging cannot work. I documented what I need on a 
structural level (IDN, IANA, OPES, ccTLD, etc.), quoted needs documented by 
ETSI. I could add demands by MINC. I have not yet discussed it with other 
comparable intergovernance entities.

I think it shows that the discussed tagging is partly a subset of the 
desirable multilingual internet language tagging. That there is not a 
complete overlap of some elements (for example the Internet does not goes 
by the ISO 3166 list but by its use by the IANA. You may like it or not 
IANA is now "an ICANN function", so ICANN's vision of the Internet 
governance must be possibly supported).

At 08:22 29/12/2004, Martin Duerst wrote:
>At 15:18 04/12/27, JFC (Jefsey) Morfin wrote:
>This discussion is about language identifiers for content. And on this 
>list (www-international@w3.org) in particular, about language identifiers 
>for Web content.

Right. And this is in the W3C scope. But here we discuss an IESG final 
stage review within the Internet standard process. I have no problem with 
the draft being a W3C(+ private comments) RFC for information. And I even 
made clear I supported it to be quickly approved as such. But this text 
does not fit the Internet architecture principle and needs.

>Language issues for content and language issues for domain name 
>registrations are quite different.

Absolutely. And there are many other language and scripting related 
issues. This is why one issue cannot be the basis to document a standard 
imposed to the others.

> >This is only an IETF document,
>The document that Tex put up is not an IETF document, just
>a Web page put up in the hope to help people making a good
>selection for tagging their Web content quickly
>(in my opinion, that Web page still has some way to go
>to reach that goal, but that's a separate issue).

I understand Tex's document as an example of the Phillips-08 draft. If you 
understand it and discuss it differently, I apologize. Tex's page is 
contributing to the Internet intergovernance in helping others. This kind 
of work can be helped a lot by the local governance trustees who are the 
ccTLD, just because they are local competent and interested entities.

>RFC 3066 and RFC 3066bis don't define language authority. They just 
>define ways to generate or register tags for existing languages.

This is unfortunately what you do not understand. They actually do, and 
create authority conflict. Because they support only one definition and do 
not not permit to document who has decided about the tag documented 
information for what purpose. So the authority is on a first use basis. 
This is OK for the W3C as in most of the cases web pages are the first 
using the tag.

>And I am not aware of an RFC (as opposed to ICANN document) that defines 
>language authority. (I may have missed one.)

IDN RFCs do. They define who is to register a Table for each language and 
scripting (you cannot really discriminate between language and scripting, 
because tables are voted by local users and technician and are more to be 
ccTLD vernacular for a language). The dispute over LHS shown that the 
scripting could be different, but the general opposition to the support of 
upper cases show that the scripting is to be special (no upper case, as 
per RFC 1958 principle).

> >As for naming, languages are chosen and documented by the local 
> internet communities, represented by their Trustees, the ccTLD Managers 
> (the SLD Manager for privately defined tags).
>No, what some ccTLDs are doing is just to document the set of characters
>that they accept for a given language. Some ccTLDs (such as .de and .ch)
>have carefully avoided doing even that; the set of characters they
>accept for IDNs is mostly based on system considerations. (The reason
>they have done that may also to some extent be because they don't
>think that language is or should be a major determinant for domain
>name registry operation; I would agree that script is much more important).

This is accidental comment which can help thinking but is no basis for 
standarization. I am a ccTLD Registry Manager and I am developping a 
generic NIC system. I am involved in the international network 
intergovernance for 27 years now. I cannot tell all what is needed, but I 
can tell you that no restriction of possible requirement can seriously 
considered. Both technically because you do not know what innovation can 
call for, and practically because we are facing users (represented by 
their Govs) sovereignty.

> >The same as IANA is not in the business of defining countries (RFC 
> 1591), IANA is not in the business of defining the languages of the countries.
>Neither are ccTLDs. In many countries, they would get into problems if 
>they tried to do that.

See remark above.

>  Language is much more than just a set of characters.

Fully true. And this is why it is dangerous to limitate them to just a 
bundle of tag elements. ISO standard keep language, scripting, country, 
geography, functions, types of authority, etc. separated and respect 
people's individuality. I just want, in a document to be endorsed  by the 
IESG, that the items necessary to the Internet structure are presentand 
not only to an application.

 >All what an _RFC_ can say is that language tags identify the IDNA Tables 
published by the ccTLD Manager, as the Trustee of his local internet 
community (we talk of the language used by network/protocol related 
issues). Or by the SLD Managers for their domain. I certainly favor 
Unicode, locales, contexts, etc. converge, but that rises first many many 
more multilingual Internet related issues, the RFC 3066bis does not want 
to discuss.

>RFC 3066 and 3066bis codes may be used for labeling sets of characters
>used in the domain name system. But compared with their use for labeling
>content, and for requesting content,..., such a use is extremely marginal.
>(there are currently maybe a few dozens of such tables, but there are
>millions and millions of Web pages, for example).

This consideration has obviously nothing to do in a standardization 
document. All the more than lingual web will more and more IDNs to be of any use.

> >I fully understand that most of the ccTLD Managers have not published 
> language tables and that other applications than DNS call for an 
> immediate support, alaso that SLD Manager may need off-the-shelves 
> tables. However this support by non-ccTLD Managers can only be temporary 
> and MUST be eventually consistent with the ccTLD Manager tables such an 
> RFC should call for. Otherwise we have a real layer and autority 
> violation, all the more than this is not only by RFC 1591, ICANN ICP-1 
> but also by the WSIS 2003 Resolutions underlinging the sovereignty of 
> Govs over ccTLDs. There is no problem in documenting the duties of a 
> ccTLD Manager in this area and in discussing it with ccTLDs Managers, as 
> an addition to the ccTLD Manager BPs.
>Again, this is not about 'language tables' for IDN.

Again, an IESG document is concerned by IDN language tables and IANA management.

> >I would therefore review the ABNF in four areas:
> >- favoring the three letter codes for the language to make this entry 
> time independent and consistent (this does not change anything in the 
> currenet applications)
>No, this would change a lot, because most Web content out there 
>currently uses two-letter codes. Also, RFC 3066, for good reasons, prefers 
>two-letter codes where available.

I suspect that the good reasons (plural) is only that it followed an 
obscure (in terms of Internet standard process) procedure. Otherwise you 
would document it. I documented how this practice is contrary to Internet 
architectural principles.

Now, I certainly consider your odd usage (as far as I understand a dated 
usage) and have no problem with it in saying "favor". This does not mean 
that your usage is not supported until deprecated. You only have to 
understand that, you may like or not, Internet architecture cannot support 
the fr-FR notation. So it will necessary be fra-fr. The question is only 
to know if you can accomodate it in your standard with a "fr-fr" 
transition where the first item is a two letter language element.

I note that CRC work (common reference center) will lead to a numeric 
desciption to permit the support of archiving addressing. And that IDN 
will call for the tags themselves to be supported in every languages.

Received on Monday, 3 January 2005 11:14:20 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:50 UTC