- From: Martin Duerst <duerst@w3.org>
- Date: Sun, 15 Feb 2004 11:18:56 -0500
- To: "Adam M. Costello BOGUS address, see signature" <BOGUS@BOGUS.nicemice.net>(by way of Martin Duerst <duerst@w3.org>), uri@w3.org
- Cc: public-iri@w3.org
Hello Adam, Many thanks for your comments. At 09:51 04/02/15 -0500, Adam M. Costello BOGUS address, see signature wrote: >"Roy T. Fielding" <fielding@gbiv.com> wrote: > > > This was implemented as part of removing hostname productions in favor > > of general registered names. > >Martin Duerst <duerst@w3.org> replied: > > > The restriction of hostnames to DNS was discussed and agreed on at the > > San Francisco IETF based on interactions with IRIs. > > > > The argument was that conversion from IRIs to URIs (defined in the > > IRI spec) should take care of conversion from non-ASCII characters to > > punycode in the DNS part. > >I was very happy to see the IRI draft take that approach. The issue is >explained very well in the issues list (040-reg-name): > > report: Martin Duerst, 20 Mar 2003, URI BOF: > > In order for internationalized characters in the authority > component to be handled directly by an IRI processor, it must > either > > a) be able to encode the authority characters as %hh and rely on > gethostbyname to do the conversion, or > > b) know that the scheme uses hostport and not registry-based names > and thus be able to convert the hostname to IDNA form. > > action: Roy T. Fielding, 20 Mar 2003, URI BOF: > > Note that IDNA was created specifically to avoid (a), so that > doesn't seem to be a viable alternative for the IETF. > >Exactly. Why go to the trouble of defining a backward-compatible >encoding (ACE) and then make it impossible to use? I don't think the current RFC2396bis draft says that you can't use ACE. If you use ACE, it will just work. And I think a) is a bit too short, it should read a) be able to encode the authority characters as %hh and rely on gethostbyname or a layer (just) above it to do the conversion, or >What's the point of >downgrading an IRI to a URI if the URI still fails on legacy software? In practice, things are a little bit more complicated, but that actually makes this choice a little bit easier. When implementing IRIs on something like a browser, what I have seen (or done myself) so far is that it is much easier to implement the UTF-8 and %-escape steps in one place, and the IDN -> punycode step much lower in the stack. The IRI draft (if and when I get around to do the edits this afternoon) will change to convert everything to %-escapes, but it will contain a note that points out that for backwards compatibility, in particular for proxy and similar scenarios where IRI -> URI mapping and DNS resolution are strictly separated (and under the condition that the scheme is known to be DNS-based), implementations MAY convert directly to punycode. So in theory, this is a black-and-white distinction, but in practice, it's not. >RFC-2396 defined the host field as a host name or IPv4 address; there >was no mention of registered names. Sorry, wrong. From http://www.ietf.org/rfc/rfc2396.txt: >>>> 3.2. Authority Component Many URI schemes include a top hierarchical element for a naming authority, such that the namespace defined by the remainder of the URI is governed by that authority. This authority component is typically defined by an Internet-based server or a scheme-specific registry of naming authorities. authority = server | reg_name >>>> And while in San Francisco, the general understanding was that registry-based naming authorities that use DNS hostnames have been the only such URIs in deployment, in the meantime, this understanding has been crumbled in the meantime. In addition, it was considered highly unadvisable to bet the future of URIs and IRIs on the DNS. >Currently, a URI like http://www.w%33.org/ will fail on many browsers, >which is no problem because the URI is invalid according to RFC-2396. It works on IE, Opera, and Amaya. And it's not really an issue, because nobody would actually use that except for testing. For %-escapes derived from IDNs, it's very easy to make IRIs, IDNs, and this %-escaping all work without problems. Please remember: a browser that doesn't support IDNs just doesn't. >By the way, the draft contains a factual error: > > > The reg-name syntax allows for percent-encoded octets, which is > > necessary to enable internationalized domain names to be provided in > > URIs; > >Every IDN has an ACE form; therefore percent-escapes are not necessary >for using IDNs in URIs. Percent-escapes would be necessary for >using internationalized reg-names (because reg-names are not domain >names and IDNA does not apply to them), but not necessary for using >internationalized domain names. I suggest to change this to: The reg-name syntax allows for percent-encoded octets, in order to enable internationalized domain names to be provided in URIs in an uniform way; >Stephen Pollei <stephen_pollei@comcast.net> wrote: > > > So it's my understanding that lots of names are legal, just not > > recommended. >RFC-952 gave the syntax: >So there is no doubt that host names can contain only ASCII letters, >digits, hyphens, and dots. It's an open-and-shut case. So Stephen's host, with an underscore, just doesn't exist, or what? Even if every browser actually gets there? Is the tail wagging the dog here, or what do you think is going on? Regards, Martin.
Received on Sunday, 15 February 2004 11:23:30 UTC