W3C home > Mailing lists > Public > public-iri@w3.org > August 2009

Re: IDN handling, please help

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 31 Aug 2009 19:20:12 +0900
Message-ID: <4A9BA3DC.1040200@it.aoyama.ac.jp>
To: Wil Tan <wil@dready.org>
CC: Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>

On 2009/08/31 1:58, Wil Tan wrote:
> On Sun, Aug 30, 2009 at 11:08 AM, Larry Masinter<masinter@adobe.com>  wrote:
>>   I’m reading this text over and over again, and I really don’t get it. Can
>> someone explain what the distinction is between “scheme definition does not
>> allow percent-encoding for ireg-name, and scheme definition DOES allow
>> percent-encoding for ireg-name”?  What schemes allow percent-encoding for
>> ireg-name, for example?
> RFC3986 allows percent encodings in the "reg-name" subcomponent, but some
> URI schemes do not allow it. Given that most URI schemes are defined in
> terms of URI rather than IRI, the "ireg-name" probably should be "reg-name"
> here.

Good point.

>> Not sure what problem this is solving, or why the two algorithms are
>> different, or whether one is just a shortcut in a special case.
> Which two algorithms, you mean "percent encoding" and IDNA ToASCII? I
> suppose the former is a generic way of encoding non-ASCII characters in the
> reg-name field of a URI, and the latter is used when one knows for sure that
> the reg-name uses DNS.

Yes indeed.

Regards,    Martin.

> =wil
>> =================================================
>> Systems accepting IRIs MAY convert the ireg-name component of an IRI as
>> follows (before step 2 above) for schemes known to use domain names in
>> ireg-name, if the scheme definition does not allow percent-encoding for
>> ireg-name: Replace the ireg-name part of the IRI by the part converted using
>> the ToASCII operation specified in Section 4.1 of [RFC3490] (Faltstrom,
>> P., Hoffman, P., and A. Costello, “Internationalizing Domain Names in
>> Applications (IDNA),” March 2003.)<http://larry.masinter.net/draft-duerst-iri-bis.html#RFC3490>on each dot-separated label, and by using U+002E (FULL STOP) as a label
>> separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the flag
>> AllowUnassigned set to FALSE for creating IRIs and set to TRUE otherwise.
>> The ToASCII operation may fail, but this would mean that the IRI cannot be
>> resolved. This conversion SHOULD be used when the goal is to maximize
>> interoperability with legacy URI resolvers. For example, the IRI
>> "http://r&#xE9;sum&#xE9;.example.org"
>> may be converted to
>> "http://xn--rsum-bpad.example.org"
>> instead of
>> "http://r%C3%A9sum%C3%A9.example.org".
>> An IRI with a scheme that is known to use domain names in ireg-name, but
>> where the scheme definition does not allow percent-encoding for ireg-name,
>> meets scheme-specific restrictions if either the straightforward conversion
>> or the conversion using the ToASCII operation on ireg-name result in an URI
>> that meets the scheme-specific restrictions.
>> --
>> http://larry.masinter.net

#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Monday, 31 August 2009 10:21:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:35 UTC