- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 29 Dec 2004 16:22:37 +0900
- To: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>(by way of Martin Duerst <duerst@w3.org>), www-international@w3.org
At 15:18 04/12/27, JFC (Jefsey) Morfin wrote: >I gave some thinking to all this and reviewed the documents that W3C also prepare. I am afraid we want to put too many unrelated things into the same debate, due to a confusion between the three internationalization, multilingualization and vernacularization layers wich are not identifed and documented yet, while some attempt to discuss what belongs to lingual authoritative sources. This discussion is about language identifiers for content. And on this list (www-international@w3.org) in particular, about language identifiers for Web content. Language issues for content and language issues for domain name registrations are quite different. >This is only an IETF document, The document that Tex put up is not an IETF document, just a Web page put up in the hope to help people making a good selection for tagging their Web content quickly (in my opinion, that Web page still has some way to go to reach that goal, but that's a separate issue). >talking only about network interoperablity. It must be consistent with other RFCs. Other RFCs have defined the Internet language/country authorities: RFC 3066bis cannot say otherwise. RFC 3066 and RFC 3066bis don't define language authority. They just define ways to generate or register tags for existing languages. And I am not aware of an RFC (as opposed to ICANN document) that defines language authority. (I may have missed one.) >As for naming, languages are chosen and documented by the local internet communities, represented by their Trustees, the ccTLD Managers (the SLD Manager for privately defined tags). No, what some ccTLDs are doing is just to document the set of characters that they accept for a given language. Some ccTLDs (such as .de and .ch) have carefully avoided doing even that; the set of characters they accept for IDNs is mostly based on system considerations. (The reason they have done that may also to some extent be because they don't think that language is or should be a major determinant for domain name registry operation; I would agree that script is much more important). >The same as IANA is not in the business of defining countries (RFC 1591), IANA is not in the business of defining the languages of the countries. Neither are ccTLDs. In many countries, they would get into problems if they tried to do that. Language is much more than just a set of characters. >All what an _RFC_ can say is that language tags identify the IDNA Tables published by the ccTLD Manager, as the Trustee of his local internet community (we talk of the language used by network/protocol related issues). Or by the SLD Managers for their domain. I certainly favor Unicode, locales, contexts, etc. converge, but that rises first many many more multilingual Internet related issues, the RFC 3066bis does not want to discuss. RFC 3066 and 3066bis codes may be used for labeling sets of characters used in the domain name system. But compared with their use for labeling content, and for requesting content,..., such a use is extremely marginal. (there are currently maybe a few dozens of such tables, but there are millions and millions of Web pages, for example). >I fully understand that most of the ccTLD Managers have not published language tables and that other applications than DNS call for an immediate support, alaso that SLD Manager may need off-the-shelves tables. However this support by non-ccTLD Managers can only be temporary and MUST be eventually consistent with the ccTLD Manager tables such an RFC should call for. Otherwise we have a real layer and autority violation, all the more than this is not only by RFC 1591, ICANN ICP-1 but also by the WSIS 2003 Resolutions underlinging the sovereignty of Govs over ccTLDs. There is no problem in documenting the duties of a ccTLD Manager in this area and in discussing it with ccTLDs Managers, as an addition to the ccTLD Manager BPs. Again, this is not about 'language tables' for IDN. >I would therefore review the ABNF in four areas: >- favoring the three letter codes for the language to make this entry time independent and consistent (this does not change anything in the currenet applications) No, this would change a lot, because most Web content out there currently uses two-letter codes. Also, RFC 3066, for good reasons, prefers two-letter codes where available. Regards, Martin.
Received on Thursday, 30 December 2004 07:16:36 UTC