W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2008

Re: §2.1.3 IRI/URI Canonicalization does not address IRIs with IDNs

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 10 Apr 2008 15:00:58 +0900
Message-ID: <47FDAD1A.1090601@w3.org>
To: Phil Archer <parcher@icra.org>
CC: Eric Prud'hommeaux <eric@w3.org>, public-powderwg@w3.org, public-i18n-core@w3.org

Hi Phil,

I was looking into this section in your attachment:

[
2.1.3.4 Internationalized Domain Names
    * Internationalized Domain Names (IDNs) should be converted from 
Punycode [RFC3492] into their UTF-8 string representations. So that, for 
example:
      http://www.xn--exmple-jua.org/
      becomes
      http://www.exåmple.org/.
]

If you have
http://www.xn--exmpless-jua.org/
It is not possible to decide whether it should become
http://www.exåmpless.org/
or
http://www.exåmpleß.org/
since "ss" in the Punycode string could have been originally "ss" or "ß".
So I think this canonicalization step is not feasible. I'm also not sure 
if it is necessary: If you get http://www.xn--exmpless-jua.org/ you 
could process it in Powder just "as is", without trying to go to the 
representation with non-ASCII characters. The same for 
http://www.exåmpless.org/ . But maybe I missing something?

Just let me know what you think. Note that the problem of the 
unidirectional relation between "ß" and "ss" is a problem of IDNs which 
will soon be addressed by a proposed IETF Working Group, see 
http://www.alvestrand.no/pipermail/idna-update/2008-March/001343.html

Felix
Received on Thursday, 10 April 2008 06:02:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:55 GMT