- From: Adam M. Costello BOGUS address, see signature <BOGUS@BOGUS.nicemice.net>
- Date: Mon, 16 Feb 2004 09:14:45 -0500
- To: uri@w3.org
Roy T. Fielding <fielding@gbiv.com> wrote:
> Applications that use DNS for the sake of host name resolution must
> obey those restrictions -- I have tried to clarify that in the
> specification.
Thanks, I generally like the clarification.
> A host identified by a registered name is a string of characters
> that is intended for lookup within a locally-defined host or service
> name registry. The most common of such registry mechanisms is the
> Domain Name System (DNS), as defined by Section 3 of [RFC1034] and
> Section 2.1 of [RFC1123]. A DNS name consists of a sequence of domain
> labels...
This sounds like section 3 of [RFC1034] and section 2.1 of [RFC1123]
define the DNS registry mechanism, but they merely define the name
syntax. I think the intention was:
The most common of such registry mechanisms is the Domain Name
System (DNS). A host name intended for lookup in the DNS uses
the syntax defined in section 3.5 of [RFC1034] and section 2.1 of
[RFC1123]. Such a name consists of a sequence of domain labels...
> When a non-ASCII host name represents an internationalized domain
> name intended for resolution via DNS, the name must be transformed
> to the IDNA encoding [RFC3490] prior to name lookup. URI producers
> should provide such host names in the IDNA encoding, rather than a
> percent-encoding, if they wish to maximize interoperability with
> legacy URI resolvers.
I think that understates the situation. What do you think of this:
When a non-ASCII reg-name represents an internationalized domain
name (IDN), the rules of IDNA apply [RFC3490]. IDNA requires that
the name be transformed to its ASCII-compatible encoding (ACE)
sometime prior to being looked up in the DNS. Furthermore, IDNA
requires that any producer of an IDN as a protocol element use
the ACE form unless it knows that the consumer understands IDNA.
Therefore, in the absence of such knowledge, any URI producer that
wishes to use a non-ASCII domain name in the host component of a URI
is required by IDNA to use the ACE form, not a percent-encoded UTF-8
form. This requirement is needed in order to interoperate with
legacy URI resolvers, which do not know how to convert to the ACE
form prior to DNS lookup.
I know that creates a challenge for the IRI spec, but that is really
what IDNA implies.
The percent-encoding issue gives rise to compatibility considerations
analogous to the ones faced by IDNA, which could be addressed in an
analogous way:
A previous version of the URI specification [RFC2396] did
not permit percent-encoding within domain names in the host
component, and there exist legacy URI resolvers that do not perform
percent-decoding on domain names in the host component. Therefore,
a URI producer MUST NOT produce a URI with a host component
containing a percent-encoded domain name (not even a percent-encoded
ASCII domain name) unless the URI is being put into a context that
explicitly invites such a URI. This restriction applies only to
domain names, not general reg-names.
A simpler approach that would subsume both paragraphs would be to just
continue the RFC-2396 prohibition of percent-escapes within domain names
in the host component:
Although percent-encoding is generally allowed in a reg-name, it is
not allowed in a reg-name that is a domain name, for compatibility
with the previous version of the URI specification [RFC2396].
Internationalized domain names can be supported using IDNA
[RFC3490].
I don't think this stricter formulation is any more challenging for the
IRI spec, and it sure is a lot simpler than those two long paragraphs.
With either the long formulation or short strict formulation, the IRI
spec would face the same challenge, and I see that it's a formidable
one, but I'm not yet ready to give up hope of overcoming it.
AMC
Received on Monday, 16 February 2004 09:20:33 UTC