W3C home > Mailing lists > Public > public-iri@w3.org > August 2009

Re: IDN handling, please help

From: Wil Tan <wil@dready.org>
Date: Mon, 31 Aug 2009 02:58:07 +1000
Message-ID: <b789c2f00908300958x646d64bs2be7d27d900d9a52@mail.gmail.com>
To: Larry Masinter <masinter@adobe.com>
On Sun, Aug 30, 2009 at 11:08 AM, Larry Masinter <masinter@adobe.com> wrote:

>  I’m reading this text over and over again, and I really don’t get it. Can
> someone explain what the distinction is between “scheme definition does not
> allow percent-encoding for ireg-name, and scheme definition DOES allow
> percent-encoding for ireg-name”?  What schemes allow percent-encoding for
> ireg-name, for example?

RFC3986 allows percent encodings in the "reg-name" subcomponent, but some
URI schemes do not allow it. Given that most URI schemes are defined in
terms of URI rather than IRI, the "ireg-name" probably should be "reg-name"

> Not sure what problem this is solving, or why the two algorithms are
> different, or whether one is just a shortcut in a special case.

Which two algorithms, you mean "percent encoding" and IDNA ToASCII? I
suppose the former is a generic way of encoding non-ASCII characters in the
reg-name field of a URI, and the latter is used when one knows for sure that
the reg-name uses DNS.


> =================================================
> Systems accepting IRIs MAY convert the ireg-name component of an IRI as
> follows (before step 2 above) for schemes known to use domain names in
> ireg-name, if the scheme definition does not allow percent-encoding for
> ireg-name: Replace the ireg-name part of the IRI by the part converted using
> the ToASCII operation specified in Section 4.1 of [RFC3490] (Faltstrom,
> P., Hoffman, P., and A. Costello, “Internationalizing Domain Names in
> Applications (IDNA),” March 2003.)<http://larry.masinter.net/draft-duerst-iri-bis.html#RFC3490>on each dot-separated label, and by using U+002E (FULL STOP) as a label
> separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the flag
> AllowUnassigned set to FALSE for creating IRIs and set to TRUE otherwise.
> The ToASCII operation may fail, but this would mean that the IRI cannot be
> resolved. This conversion SHOULD be used when the goal is to maximize
> interoperability with legacy URI resolvers. For example, the IRI
> "http://r&#xE9;sum&#xE9;.example.org"
> may be converted to
> "http://xn--rsum-bpad.example.org"
> instead of
> "http://r%C3%A9sum%C3%A9.example.org".
> An IRI with a scheme that is known to use domain names in ireg-name, but
> where the scheme definition does not allow percent-encoding for ireg-name,
> meets scheme-specific restrictions if either the straightforward conversion
> or the conversion using the ToASCII operation on ireg-name result in an URI
> that meets the scheme-specific restrictions.
> --
> http://larry.masinter.net
Received on Sunday, 30 August 2009 16:58:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:35 UTC