- From: Adam M. Costello BOGUS address, see signature <BOGUS@BOGUS.nicemice.net>
- Date: Sun, 15 Feb 2004 22:48:36 -0500
- To: uri@w3.org
Roy T. Fielding <fielding@gbiv.com> wrote: > I no longer believe the claim that the authority portion of a URI is > restricted to the DNS hostname syntax. I retract that claim. I missed the reg_name token in RFC-2396. I see now that RFC-2396 allows the authority to contain either a server or a reg-name. A server must be either a host name or an address literal (in which percent-escapes are neither allowed nor needed) or a reg-name (which is much more permissive). > the syntax standard should reflect how it is actually implemented > interoperably in practice: namely, it delegates the issue of hostname > syntax conformance to the operating system, and that operating system > decides what it will allow for the purposes of host resolution. That might work for questions about which ASCII characters are allowed in host names. If the new URI syntax wants to allow underscores and plus-signs and such in host names, it may indeed be true that every implementation out there will accept them and just pass them through to the operating system's resolver, which is okay. But for the question of whether percent-escapes are allowed in host names, I don't think that argument holds. Implementations are not interoperable for that case. Faced with http://jos%C3%A9.net/, some applications percent-decode the name before calling the resolver, and some pass jos%C3%A9.net literally to the resolver. The latter group of applications don't include a codepath for performing percent-decoding on host names because RFC-2616 and RFC-2396 promise that it's not necessary (RFC-2616 promises that the name is a host, not a reg_name, and RFC-2396 promises no percent-escapes in host names). (I have observed that Firefox and w3m on Linux pass the percent-signs straight through to the resolver, which passes them into the wire in DNS queries. I presume that some applications will perform percent-decoding, but I haven't observed it myself.) For applications that perform percent-decoding before calling the resolver, will they then perform UTF-8-to-local-charset transcoding before calling the resolver? Maybe some will and some won't. I would be skeptical of any claims that all will or all won't. > I am tired of the arguments used to justify the DNS hostname syntax. > DNS itself is not restricted to that syntax, DNS is willing and able to store arbitrary names but says that names also need to follow the rules of whatever they're naming; DNS does not loosen the rules of particular kinds of names just because they happen to be stored in the DNS. Host names stored in DNS are still subject to the rules of host names (RFC-1123). The new URI spec could take a similar position and say that reg_names can be pretty much anything, but when schemes use reg_names to hold other kinds of names, the rules for those kinds of names still apply. In particular, schemes that specify that the reg_name field is a host name inherit the host name rules from RFC-1123, and also inherit the no-percent-escapes rule from RFC-2396 (for backward compatibility with implementations that have no codepath for percent-decoding host names). AMC http://www.nicemice.net/amc/
Received on Sunday, 15 February 2004 22:48:57 UTC