- From: Adam M. Costello BOGUS address, see signature <BOGUS@BOGUS.nicemice.net>
- Date: Mon, 16 Feb 2004 09:14:45 -0500
- To: uri@w3.org
Roy T. Fielding <fielding@gbiv.com> wrote: > Applications that use DNS for the sake of host name resolution must > obey those restrictions -- I have tried to clarify that in the > specification. Thanks, I generally like the clarification. > A host identified by a registered name is a string of characters > that is intended for lookup within a locally-defined host or service > name registry. The most common of such registry mechanisms is the > Domain Name System (DNS), as defined by Section 3 of [RFC1034] and > Section 2.1 of [RFC1123]. A DNS name consists of a sequence of domain > labels... This sounds like section 3 of [RFC1034] and section 2.1 of [RFC1123] define the DNS registry mechanism, but they merely define the name syntax. I think the intention was: The most common of such registry mechanisms is the Domain Name System (DNS). A host name intended for lookup in the DNS uses the syntax defined in section 3.5 of [RFC1034] and section 2.1 of [RFC1123]. Such a name consists of a sequence of domain labels... > When a non-ASCII host name represents an internationalized domain > name intended for resolution via DNS, the name must be transformed > to the IDNA encoding [RFC3490] prior to name lookup. URI producers > should provide such host names in the IDNA encoding, rather than a > percent-encoding, if they wish to maximize interoperability with > legacy URI resolvers. I think that understates the situation. What do you think of this: When a non-ASCII reg-name represents an internationalized domain name (IDN), the rules of IDNA apply [RFC3490]. IDNA requires that the name be transformed to its ASCII-compatible encoding (ACE) sometime prior to being looked up in the DNS. Furthermore, IDNA requires that any producer of an IDN as a protocol element use the ACE form unless it knows that the consumer understands IDNA. Therefore, in the absence of such knowledge, any URI producer that wishes to use a non-ASCII domain name in the host component of a URI is required by IDNA to use the ACE form, not a percent-encoded UTF-8 form. This requirement is needed in order to interoperate with legacy URI resolvers, which do not know how to convert to the ACE form prior to DNS lookup. I know that creates a challenge for the IRI spec, but that is really what IDNA implies. The percent-encoding issue gives rise to compatibility considerations analogous to the ones faced by IDNA, which could be addressed in an analogous way: A previous version of the URI specification [RFC2396] did not permit percent-encoding within domain names in the host component, and there exist legacy URI resolvers that do not perform percent-decoding on domain names in the host component. Therefore, a URI producer MUST NOT produce a URI with a host component containing a percent-encoded domain name (not even a percent-encoded ASCII domain name) unless the URI is being put into a context that explicitly invites such a URI. This restriction applies only to domain names, not general reg-names. A simpler approach that would subsume both paragraphs would be to just continue the RFC-2396 prohibition of percent-escapes within domain names in the host component: Although percent-encoding is generally allowed in a reg-name, it is not allowed in a reg-name that is a domain name, for compatibility with the previous version of the URI specification [RFC2396]. Internationalized domain names can be supported using IDNA [RFC3490]. I don't think this stricter formulation is any more challenging for the IRI spec, and it sure is a lot simpler than those two long paragraphs. With either the long formulation or short strict formulation, the IRI spec would face the same challenge, and I see that it's a formidable one, but I'm not yet ready to give up hope of overcoming it. AMC
Received on Monday, 16 February 2004 09:20:33 UTC