Re: uri handling of hosts is too restrictive from Roy T. Fielding on 2004-02-16 (uri@w3.org from February 2004)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Sun, 15 Feb 2004 20:56:33 -0800
To: uri@w3.org
Message-Id: <7E22A94D-603C-11D8-8468-000393753936@gbiv.com>
> > the syntax standard should reflect how it is actually implemented
> > interoperably in practice: namely, it delegates the issue of hostname
> > syntax conformance to the operating system, and that operating system
> > decides what it will allow for the purposes of host resolution.
>
> That might work for questions about which ASCII characters are allowed
> in host names.  If the new URI syntax wants to allow underscores and
> plus-signs and such in host names, it may indeed be true that every
> implementation out there will accept them and just pass them through to
> the operating system's resolver, which is okay.
>
> But for the question of whether percent-escapes are allowed in host
> names, I don't think that argument holds.  Implementations are not
> interoperable for that case.  Faced with http://jos%C3%A9.net/, some
> applications percent-decode the name before calling the resolver, and
> some pass jos%C3%A9.net literally to the resolver.  The latter group of
> applications don't include a codepath for performing percent-decoding 
> on
> host names because RFC-2616 and RFC-2396 promise that it's not 
> necessary
> (RFC-2616 promises that the name is a host, not a reg_name, and 
> RFC-2396
> promises no percent-escapes in host names).

They don't need such a codepath.  They will fail as "not found", which
is all they need to do to retain interoperability during name 
resolution.

> (I have observed that Firefox and w3m on Linux pass the percent-signs
> straight through to the resolver, which passes them into the wire
> in DNS queries.  I presume that some applications will perform
> percent-decoding, but I haven't observed it myself.)
>
> For applications that perform percent-decoding before calling the
> resolver, will they then perform UTF-8-to-local-charset transcoding
> before calling the resolver?  Maybe some will and some won't.  I would
> be skeptical of any claims that all will or all won't.

All of the ones that I am aware of will do so because they do the
percent decoding to octets, not ASCII.

> > I am tired of the arguments used to justify the DNS hostname syntax.
> > DNS itself is not restricted to that syntax,
>
> DNS is willing and able to store arbitrary names but says that names
> also need to follow the rules of whatever they're naming; DNS does not
> loosen the rules of particular kinds of names just because they happen
> to be stored in the DNS.  Host names stored in DNS are still subject to
> the rules of host names (RFC-1123).
>
> The new URI spec could take a similar position and say that reg_names
> can be pretty much anything, but when schemes use reg_names to hold
> other kinds of names, the rules for those kinds of names still apply.
> In particular, schemes that specify that the reg_name field is a host
> name inherit the host name rules from RFC-1123, and also inherit the
> no-percent-escapes rule from RFC-2396 (for backward compatibility with
> implementations that have no codepath for percent-decoding host names).

Applications that use DNS for the sake of host name resolution
must obey those restrictions -- I have tried to clarify that in the
specification.

....Roy
Received on Sunday, 15 February 2004 23:57:26 UTC