- From: Adam M. Costello BOGUS address, see signature <BOGUS@BOGUS.nicemice.net>
- Date: Sun, 28 Mar 2004 22:42:30 +0000
- To: uri@w3.org
"Roy T. Fielding" <fielding@gbiv.com> wrote: > When 2396 is updated, all protocols that depend on 2396 (including > HTTP) are automatically revised as a result -- that is the nature of a > normative reference Is that principle documented somewhere? > As such, the http URI will be defined in terms of the new URI RFC as > soon as that RFC is published, unless (or until) a revised 2616 is > published that says differently. Even if we accept that, and if the current draft of rfc2396bis is published as an RFC, it still looks like percent-encoded non-ASCII hostnames are not allowed in http: URIs. The only thing rfc2396bis says about IDNs is this: When a non-ASCII host name represents an internationalized domain name intended for resolution via DNS, the name must be transformed to the IDNA encoding [RFC3490] prior to name lookup. It merely restates a fact that is already stated in IDNA, but does not tell us when a reg-name represents an IDN (as opposed to a non-domain-name); therefore it does not explicitly designate a protocol element for carrying an IDN, which is a prerequisite for using a non-ASCII domain name in a protocol element (according to IDNA). Presumably an individual scheme spec could say that non-ASCII reg-names in its host component do in fact represent IDNs, but of course the HTTP spec does not say this for the http: scheme (because it predates IDNA). Perhaps it was your intention for rfc2396bis to make a stronger statement, something like: For any scheme that uses the reg-name component to hold domain names, percent-encoded non-ASCII names represent internationalized domain names, and therefore they must be transformed to ASCII prior to lookup in DNS, as specified in IDNA [RFC3490]. That would suffice if we knew that names in the host component of http: URIs were domain names, but after the publication of rfc2396bis we won't know that anymore. Under RFC-2396, we knew that the foo in http://foo/ was a host name (which is a kind of domain name) because hostname was the only kind of name in the grammar for the host component. But if the citations in the HTTP spec to RFC-2396 are implicitly redirected to RFC-2396bis, then foo is now a reg-name, which is not necessarily a domain name, and therefore the stronger statement above doesn't apply. The HTTP spec never bothered to say that its names were domain names, because the citation to RFC-2396 implied it. In order to get non-ASCII domain names into http: URIs without reissuing the HTTP spec, I think rfc2396bis would not only have to use the stronger statement above, but also distinguish between a host and a reg-name, for example: authority = [ userinfo "@" ] coordinator [ ":" port ] A scheme can use either of two kinds of coordinators. For schemes that use hosts identified by standard internet identifiers (IP addresses and domain names), coordinator = host host = IP-literal / IPv4address / hostname For schemes that use hosts identified by other means, or non-hosts (like abstract namespace registries), coordinator = reg-name Generic URI parsers that don't know which kind of scheme they're dealing with can use coordinator = *( unreserved / pct-encoded / sub-delims ) / "[" *( unreserved / sub-delims / ":" ) "]" In any URI for which either of the more specific coordinator rules matches, the less specific rule will also match the same substring. This way, the HTTP spec's reference to the "host" token of RFC-2396, which gets redirected to RFC-2396bis, would still imply that names are domain names, and therefore the stronger statement about IDNs, if it were included in RFC-2396bis, would apply to http: URIs, and non-ASCII IDNs would be allowed (percent-encoded) in http: URIs. Allowing non-ASCII host names in http: URIs would invite interoperability problems with legacy browsers, but if you want it anyway, here's a way to get it. An alternative approach is to not try to get non-ASCII host names into existing schemes. New schemes could use non-ASCII host names (if their specs say so), but existing schemes could not use them until their individual scheme specs are revised, and each scheme could decide whether it wanted to do that and incur the interoperability penalty. In the meantime, the IRI spec would have to face the issue of URI schemes in which non-ASCII host names are permitted by the generic URI spec but not by the scheme spec. > If all of the HTTP implementations send the host subcomponent verbatim > within the Host header field, then that is how the revision to 2616 > will be defined as well. And until RFC-2616 is revised, if RFC-2396bis automatically updates the http: URI syntax to allow percent-encoded non-ASCII host names, then it also automatically updates the Host: field the same way, because RFC-2616 uses the same token (host) in both places; therefore sending the host subcomponent verbatim would be correct behavior. AMC
Received on Sunday, 28 March 2004 17:42:35 UTC