Re: Language Identifier List Comments, updated

At 15:18 04/12/27, JFC (Jefsey) Morfin wrote:

 >I gave some thinking to all this and reviewed the documents that W3C also 
prepare. I am afraid we want to put too many unrelated things into the same 
debate, due to a confusion between the three internationalization, 
multilingualization and vernacularization layers wich are not identifed and 
documented yet, while some attempt to discuss what belongs to lingual 
authoritative sources.

This discussion is about language identifiers for content. And on this
list (www-international@w3.org) in particular, about language identifiers
for Web content.

Language issues for content and language issues for domain name
registrations are quite different.

 >This is only an IETF document,

The document that Tex put up is not an IETF document, just
a Web page put up in the hope to help people making a good
selection for tagging their Web content quickly
(in my opinion, that Web page still has some way to go
to reach that goal, but that's a separate issue).

 >talking only about network interoperablity. It must be consistent with 
other RFCs. Other RFCs have defined the Internet language/country 
authorities: RFC 3066bis cannot say otherwise.

RFC 3066 and RFC 3066bis don't define language authority. They just define
ways to generate or register tags for existing languages.

And I am not aware of an RFC (as opposed to ICANN document) that defines
language authority. (I may have missed one.)

 >As for naming, languages are chosen and documented by the local internet 
communities, represented by their Trustees, the ccTLD Managers (the SLD 
Manager for privately defined tags).

No, what some ccTLDs are doing is just to document the set of characters
that they accept for a given language. Some ccTLDs (such as .de and .ch)
have carefully avoided doing even that; the set of characters they
accept for IDNs is mostly based on system considerations. (The reason
they have done that may also to some extent be because they don't
think that language is or should be a major determinant for domain
name registry operation; I would agree that script is much more important).

 >The same as IANA is not in the business of defining countries (RFC 1591), 
IANA is not in the business of defining the languages of the countries.

Neither are ccTLDs. In many countries, they would get into
problems if they tried to do that. Language is much more
than just a set of characters.


 >All what an _RFC_ can say is that language tags identify the IDNA Tables 
published by the ccTLD Manager, as the Trustee of his local internet 
community (we talk of the language used by network/protocol related 
issues). Or by the SLD Managers for their domain. I certainly favor 
Unicode, locales, contexts, etc. converge, but that rises first many many 
more multilingual Internet related issues, the RFC 3066bis does not want to 
discuss.

RFC 3066 and 3066bis codes may be used for labeling sets of characters
used in the domain name system. But compared with their use for labeling
content, and for requesting content,..., such a use is extremely marginal.
(there are currently maybe a few dozens of such tables, but there are
millions and millions of Web pages, for example).

 >I fully understand that most of the ccTLD Managers have not published 
language tables and that other applications than DNS call for an immediate 
support, alaso that SLD Manager may need off-the-shelves tables. However 
this support by non-ccTLD Managers can only be temporary and MUST be 
eventually consistent with the ccTLD Manager tables such an RFC should call 
for. Otherwise we have a real layer and autority violation, all the more 
than this is not only by RFC 1591, ICANN ICP-1 but also by the WSIS 2003 
Resolutions underlinging the sovereignty of Govs over ccTLDs. There is no 
problem in documenting the duties of a ccTLD Manager in this area and in 
discussing it with ccTLDs Managers, as an addition to the ccTLD Manager BPs.

Again, this is not about 'language tables' for IDN.

 >I would therefore review the ABNF in four areas:
 >- favoring the three letter codes for the language to make this entry 
time independent and consistent (this does not change anything in the 
currenet applications)

No, this would change a lot, because most Web content out there currently
uses two-letter codes. Also, RFC 3066, for good reasons, prefers two-letter
codes where available.


Regards,    Martin. 

Received on Thursday, 30 December 2004 07:16:36 UTC