Re: query on iregname conversion from Roy T. Fielding on 2009-09-02 (public-iri@w3.org from September 2009)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 2 Sep 2009 14:51:30 -0700
To: Larry Masinter <masinter@adobe.com>
Cc: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-Id: <C7F02A2D-1D05-46D1-A6DE-C872C6639FC7@gbiv.com>

On Sep 2, 2009, at 10:11 AM, Larry Masinter wrote:

>    Systems accepting IRIs MAY convert the ireg-name component of an  
> IRI
>    as follows (before step 2 above) for schemes known to use domain
>    names in ireg-name, if the scheme definition does not allow  
> percent-
>    encoding for ireg-name

I don't think that is relevant now.  Schemes do not have the ability
to prevent a user from using pct-encoded triplets -- either they
don't occur in that part of the reference (and the requirement does
not apply) or they do occur in the reference and the application
has to find some reasonable thing to do in that situation.  Near as
I can tell, the only reasonable thing to do is to treat the triplet
as a pct-encoded octet even if the scheme does not allow it, since
almost all schemes were defined before IDNA existed.  Authors
started typing/pasting non-ASCII hostnames after that, regardless
of the scheme specs.

I think we should specify that pct-encoding is always decoded before
use of a component in resolution, and further that registered names
might be Unicode and that the processor is responsible for conversion
to IDNA punycode, if necessary, for the first lookup, and then resort
to sending the raw Unicode string (if their name resolver supports that
in the API) in the next lookup if the first one failed.  This allows
IDNA to have precedence (to avoid some localized masking of domains)
and yet still works for non-Internet hostname lookup.

....Roy

Received on Wednesday, 2 September 2009 21:51:48 UTC