- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Sun, 15 Feb 2004 20:56:33 -0800
- To: uri@w3.org
> > the syntax standard should reflect how it is actually implemented > > interoperably in practice: namely, it delegates the issue of hostname > > syntax conformance to the operating system, and that operating system > > decides what it will allow for the purposes of host resolution. > > That might work for questions about which ASCII characters are allowed > in host names. If the new URI syntax wants to allow underscores and > plus-signs and such in host names, it may indeed be true that every > implementation out there will accept them and just pass them through to > the operating system's resolver, which is okay. > > But for the question of whether percent-escapes are allowed in host > names, I don't think that argument holds. Implementations are not > interoperable for that case. Faced with http://jos%C3%A9.net/, some > applications percent-decode the name before calling the resolver, and > some pass jos%C3%A9.net literally to the resolver. The latter group of > applications don't include a codepath for performing percent-decoding > on > host names because RFC-2616 and RFC-2396 promise that it's not > necessary > (RFC-2616 promises that the name is a host, not a reg_name, and > RFC-2396 > promises no percent-escapes in host names). They don't need such a codepath. They will fail as "not found", which is all they need to do to retain interoperability during name resolution. > (I have observed that Firefox and w3m on Linux pass the percent-signs > straight through to the resolver, which passes them into the wire > in DNS queries. I presume that some applications will perform > percent-decoding, but I haven't observed it myself.) > > For applications that perform percent-decoding before calling the > resolver, will they then perform UTF-8-to-local-charset transcoding > before calling the resolver? Maybe some will and some won't. I would > be skeptical of any claims that all will or all won't. All of the ones that I am aware of will do so because they do the percent decoding to octets, not ASCII. > > I am tired of the arguments used to justify the DNS hostname syntax. > > DNS itself is not restricted to that syntax, > > DNS is willing and able to store arbitrary names but says that names > also need to follow the rules of whatever they're naming; DNS does not > loosen the rules of particular kinds of names just because they happen > to be stored in the DNS. Host names stored in DNS are still subject to > the rules of host names (RFC-1123). > > The new URI spec could take a similar position and say that reg_names > can be pretty much anything, but when schemes use reg_names to hold > other kinds of names, the rules for those kinds of names still apply. > In particular, schemes that specify that the reg_name field is a host > name inherit the host name rules from RFC-1123, and also inherit the > no-percent-escapes rule from RFC-2396 (for backward compatibility with > implementations that have no codepath for percent-decoding host names). Applications that use DNS for the sake of host name resolution must obey those restrictions -- I have tried to clarify that in the specification. ....Roy
Received on Sunday, 15 February 2004 23:57:26 UTC