Re: §2.1.3 IRI/URI Canonicalization does not address IRIs with IDNs from Felix Sasaki on 2008-02-12 (public-i18n-core@w3.org from January to March 2008)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 12 Feb 2008 09:47:18 +0900
To: Eric Prud'hommeaux <eric@w3.org>
CC: public-powderwg@w3.org, public-i18n-core@w3.org
Message-ID: <47B0EC96.40208@w3.org>

Hi Eric (putting i18n core into the loop),

Eric Prud'hommeaux wrote:
> http://www.w3.org/2007/powder/Group/powder-grouping/20080128.html#canon
> does not include IDN example or rules.
>   

there is no need for an IDN example or rule. IRI vs. URI, and IRI>URI 
conversion (percent escaping) are a step, which is independent of 
preprocessing necessary for domain name resolution. See also the 
processing described at

http://www.w3.org/International/articles/idn-and-iri/#idn

> An example (working) IDN IRI:
>   http://www.bravå.nu/
> The IDN is punycoded when the IRI is expressed as a URI:
>   http://www.xn--brav-toa.nu/
>
> == homonyms ==
> å can be written either Ue5 or 'a' + U30a (COMBINING RING ABOVE).
> This results in a different punycoded IDN. 

the punycode is only "seen" by the domain name server which uses it for 
domain name resolution. There is no need to use it for *IRI/URI* 
Canonicalization.

> Unicode gives *some*
> c14n (or folding) rules, but not all, and they are not cheap to
> implement.
>
> == fixing ==
> This should probably be addressed in an update of mnot's URISpace Note
>   http://www.w3.org/TR/urispace
>
> I recommend inserting in 2.1.3.3 Punycode (or maybe IDN) Conversion:
>
>   • Internationalized Domain Names (IDNs) are converted from their
>     punycode form to Unicode code points.
>   

where does this happen? Note that in IDNA version 2003, roundtripping 
Unicode > punycode < Unicode is not possible, since during the step 
Unicode > punycode, non-reversible mapping (e.g. Eszett > ss) are made. 
But as said above, I think this is out of scope for IRI/URI 
canonicalization.

Felix

Received on Tuesday, 12 February 2008 00:48:44 UTC