W3C home > Mailing lists > Public > public-powderwg@w3.org > February 2008

§2.1.3 IRI/URI Canonicalization does not address IRIs with IDNs

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 11 Feb 2008 11:43:58 -0500
To: public-powderwg@w3.org
Cc: Felix Sasaki <fsasaki@w3.org>
Message-ID: <20080211164358.GD16996@w3.org>
http://www.w3.org/2007/powder/Group/powder-grouping/20080128.html#canon
does not include IDN example or rules.

An example (working) IDN IRI:
  http://www.bravå.nu/
The IDN is punycoded when the IRI is expressed as a URI:
  http://www.xn--brav-toa.nu/

== homonyms ==
å can be written either Ue5 or 'a' + U30a (COMBINING RING ABOVE).
This results in a different punycoded IDN. Unicode gives *some*
c14n (or folding) rules, but not all, and they are not cheap to
implement.

== fixing ==
This should probably be addressed in an update of mnot's URISpace Note
  http://www.w3.org/TR/urispace

I recommend inserting in 2.1.3.3 Punycode (or maybe IDN) Conversion:

  • Internationalized Domain Names (IDNs) are converted from their
    punycode form to Unicode code points.

  Note: None of the normalization proceedures described in the Unicode
  specification are performed during POWDER IRI canonicalization.

== dissenting opinion ==
http://unicode.org/faq/normalization.html#2 suggests NFKC normalization
so perhaps you want POWDER apps to do that. You help the folks who hand-
enter a URL and happen to write it in a different form. I've never
implemented Unicode normalization, but I expect it's not trivial.

Happy trade-offs
-- 
-eric

office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA
mobile: +1.617.599.3509

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Monday, 11 February 2008 16:44:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:42:12 GMT