Re: uri handling of hosts is too restrictive

Roy T. Fielding <fielding@gbiv.com> wrote:

 > I no longer believe the claim that the authority portion of a URI is
 > restricted to the DNS hostname syntax.

I retract that claim.  I missed the reg_name token in RFC-2396.  I see
now that RFC-2396 allows the authority to contain either a server or a
reg-name.  A server must be either a host name or an address literal
(in which percent-escapes are neither allowed nor needed) or a reg-name
(which is much more permissive).

 > the syntax standard should reflect how it is actually implemented
 > interoperably in practice: namely, it delegates the issue of hostname
 > syntax conformance to the operating system, and that operating system
 > decides what it will allow for the purposes of host resolution.

That might work for questions about which ASCII characters are allowed
in host names.  If the new URI syntax wants to allow underscores and
plus-signs and such in host names, it may indeed be true that every
implementation out there will accept them and just pass them through to
the operating system's resolver, which is okay.

But for the question of whether percent-escapes are allowed in host
names, I don't think that argument holds.  Implementations are not
interoperable for that case.  Faced with http://jos%C3%A9.net/, some
applications percent-decode the name before calling the resolver, and
some pass jos%C3%A9.net literally to the resolver.  The latter group of
applications don't include a codepath for performing percent-decoding on
host names because RFC-2616 and RFC-2396 promise that it's not necessary
(RFC-2616 promises that the name is a host, not a reg_name, and RFC-2396
promises no percent-escapes in host names).

(I have observed that Firefox and w3m on Linux pass the percent-signs
straight through to the resolver, which passes them into the wire
in DNS queries.  I presume that some applications will perform
percent-decoding, but I haven't observed it myself.)

For applications that perform percent-decoding before calling the
resolver, will they then perform UTF-8-to-local-charset transcoding
before calling the resolver?  Maybe some will and some won't.  I would
be skeptical of any claims that all will or all won't.

 > I am tired of the arguments used to justify the DNS hostname syntax.
 > DNS itself is not restricted to that syntax,

DNS is willing and able to store arbitrary names but says that names
also need to follow the rules of whatever they're naming; DNS does not
loosen the rules of particular kinds of names just because they happen
to be stored in the DNS.  Host names stored in DNS are still subject to
the rules of host names (RFC-1123).

The new URI spec could take a similar position and say that reg_names
can be pretty much anything, but when schemes use reg_names to hold
other kinds of names, the rules for those kinds of names still apply.
In particular, schemes that specify that the reg_name field is a host
name inherit the host name rules from RFC-1123, and also inherit the
no-percent-escapes rule from RFC-2396 (for backward compatibility with
implementations that have no codepath for percent-decoding host names).

AMC
http://www.nicemice.net/amc/

Received on Sunday, 15 February 2004 22:48:57 UTC