Re: IDN handling, please help

On 2009/08/31 1:58, Wil Tan wrote:
> On Sun, Aug 30, 2009 at 11:08 AM, Larry Masinter<masinter@adobe.com>  wrote:
>
>>   I’m reading this text over and over again, and I really don’t get it. Can
>> someone explain what the distinction is between “scheme definition does not
>> allow percent-encoding for ireg-name, and scheme definition DOES allow
>> percent-encoding for ireg-name”?  What schemes allow percent-encoding for
>> ireg-name, for example?
>>
>
> RFC3986 allows percent encodings in the "reg-name" subcomponent, but some
> URI schemes do not allow it. Given that most URI schemes are defined in
> terms of URI rather than IRI, the "ireg-name" probably should be "reg-name"
> here.

Good point.

>> Not sure what problem this is solving, or why the two algorithms are
>> different, or whether one is just a shortcut in a special case.
>>
>
> Which two algorithms, you mean "percent encoding" and IDNA ToASCII? I
> suppose the former is a generic way of encoding non-ASCII characters in the
> reg-name field of a URI, and the latter is used when one knows for sure that
> the reg-name uses DNS.

Yes indeed.

Regards,    Martin.


> =wil
>
>
>> =================================================
>>
>>
>>
>> Systems accepting IRIs MAY convert the ireg-name component of an IRI as
>> follows (before step 2 above) for schemes known to use domain names in
>> ireg-name, if the scheme definition does not allow percent-encoding for
>> ireg-name: Replace the ireg-name part of the IRI by the part converted using
>> the ToASCII operation specified in Section 4.1 of [RFC3490] (Faltstrom,
>> P., Hoffman, P., and A. Costello, “Internationalizing Domain Names in
>> Applications (IDNA),” March 2003.)<http://larry.masinter.net/draft-duerst-iri-bis.html#RFC3490>on each dot-separated label, and by using U+002E (FULL STOP) as a label
>> separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the flag
>> AllowUnassigned set to FALSE for creating IRIs and set to TRUE otherwise.
>> The ToASCII operation may fail, but this would mean that the IRI cannot be
>> resolved. This conversion SHOULD be used when the goal is to maximize
>> interoperability with legacy URI resolvers. For example, the IRI
>> "http://r&#xE9;sum&#xE9;.example.org"
>> may be converted to
>> "http://xn--rsum-bpad.example.org"
>> instead of
>> "http://r%C3%A9sum%C3%A9.example.org".
>>
>> An IRI with a scheme that is known to use domain names in ireg-name, but
>> where the scheme definition does not allow percent-encoding for ireg-name,
>> meets scheme-specific restrictions if either the straightforward conversion
>> or the conversion using the ToASCII operation on ireg-name result in an URI
>> that meets the scheme-specific restrictions.
>>
>>
>>
>>
>>
>> --
>>
>> http://larry.masinter.net
>>
>>
>>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Monday, 31 August 2009 10:21:24 UTC