- From: Larry Masinter <masinter@adobe.com>
- Date: Wed, 2 Sep 2009 16:28:15 -0700
- To: "Roy T. Fielding" <fielding@gbiv.com>
- CC: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
>> I think we should specify that pct-encoding is always decoded before >> use of a component in resolution, > > Well, the concern was that if you mapped IRI -> URI by pct-encoding > the entire URI, you would then wind up sending around URIs with > pct-encoded domain names, into previously compliant URI processors > that would send the pct-encoded domain name to DNS. > Why do we care? Yes, it is possible that such a thing would happen, > but the result is "not found" (a safe answer). Some processors get "not found" and others get a correct result. This isn't "uniform" behavior. If translation of http://<nonascii>/path MAY be translated into pct-encoded form and MAY be also translated into punycode form, then the end-processors will work differently. It's a lack of interoperability. > The same processors > will need to be updated anyway to check for pct-encoded domains that > were entered by hand or by reference, or generated by processors > that do not know about IDNA but do pct-encode anything that is not > a valid URI character. Processors that are doing IRI -> URI mapping SHOULD *also* undo pct-encoded domains, that's fine, we're asking them to change anyway. > In other words, the situation exists regardless of how complex we > make IRI parsing, Hmmm, I'm trying to simplify IRI parsing by offering one algorithm, not two. > so the best solution is to fix the processor to > handle both Unicode and pct-encoded octets gracefully rather than > make IRI syntax scheme-dependent. I think there are different things going around as "processor" in this discussion, so I'm not sure which one you mean. (a) IRI consumers should handle Unicode and pct-encoded octets, (check) (b) these are handled gracefully (well, I dunno, it's all pretty clunky, seems like it's less clunky than before) (c) IRI syntax scheme-dependent (NO! The IRI *syntax* is uniform. The IRI processing rules are about the same. The only thing more complicated is IRI -> URI translation, which right now has two options and I think there should be one.) > This is no different than the > introduction of Host in HTTP causing all preexisting clients to > become gradually obsolete because they could not access the > increasing number of name-based virtual hosts. Not sure I understand the analogy. > Don't you think we can update the IRI document (Proposed Standard) to > not allow (MUST NOT) or at least not encourage (SHOULD NOT) any > conversion of IRI -> URI that results in pct-encoded domain names, > at least more readily than we can update the URI spec and also expect > updates to http:, ftp:, telnet:, etc. etc. URI scheme implementations > to mandate pct-decode+punycode-encode transformations > before DNS resolution? > No. I consider that to be an impossible requirement without > hardcoding the syntax of every scheme into the processor, which > would be far worse than the disease you are trying to cure. I'm not sure "disease" and "cure" are the right analogy. I think there's one bit really: is authority a domain name or not? Otherwise, there's no 'hard coding' really, just an option. I suppose this affects generic IRI -> URI translators, but there aren't that many of them, and as systems get upgraded to handle IRIs directly, there will be fewer, not more. So it seems like a win to me. Larry
Received on Wednesday, 2 September 2009 23:29:01 UTC