Re: query on iregname conversion

On Sep 2, 2009, at 3:20 PM, Larry Masinter wrote:

>> I think we should specify that pct-encoding is always decoded before
>> use of a component in resolution,
>
> Well, the concern was that if you mapped IRI -> URI by pct-encoding
> the entire URI, you would then wind up sending around URIs with
> pct-encoded domain names, into previously compliant URI processors
> that would send the pct-encoded domain name to DNS.

Why do we care?  Yes, it is possible that such a thing would happen,
but the result is "not found" (a safe answer).  The same processors
will need to be updated anyway to check for pct-encoded domains that
were entered by hand or by reference, or generated by processors
that do not know about IDNA but do pct-encode anything that is not
a valid URI character.

In other words, the situation exists regardless of how complex we
make IRI parsing, so the best solution is to fix the processor to
handle both Unicode and pct-encoded octets gracefully rather than
make IRI syntax scheme-dependent.  This is no different than the
introduction of Host in HTTP causing all preexisting clients to
become gradually obsolete because they could not access the
increasing number of name-based virtual hosts.

> Don't you think we can update the IRI document (Proposed Standard) to
> not allow (MUST NOT) or at least not encourage (SHOULD NOT) any
> conversion of IRI -> URI that results in pct-encoded domain names,
> at least more readily than we can update the URI spec and also expect
> updates to http:, ftp:, telnet:, etc. etc. URI scheme implementations
> to mandate pct-decode+punycode-encode transformations
> before DNS resolution?

No.  I consider that to be an impossible requirement without
hardcoding the syntax of every scheme into the processor, which
would be far worse than the disease you are trying to cure.

....Roy

Received on Wednesday, 2 September 2009 23:22:49 UTC