RE: IRI/URI Canonicalization does not address IRIs with IDNs

Hi all,

As per Jeremy's suggestion, I believe that the POWDER WG should follow
I18N guidelines on referencing IRIs within a POWDER document. The goal
is that a description can be assigned to a URI/IRI scope, without
risking ambiguity. It appears that homographs in IRIs[1] may introduce
such ambiguity. 

I suggest we specify that the POWDER author uses the UTF-8 encoded IRI,
and that it is the responsibility of the POWDER processor to normalise
the resource IRI so that it can be determined if it falls withing the
scope of the resource set. 

Cheers,
Kevin

[1]
http://en.wikipedia.org/wiki/IDN_homograph_attack#Homographs_in_internat
ionalized_domain_names



-----Original Message-----
From: public-powderwg-request@w3.org
[mailto:public-powderwg-request@w3.org] On Behalf Of Jeremy Carroll
Sent: 14 February 2008 17:25
To: public-powderwg@w3.org
Cc: public-i18n-core@w3.org
Subject: Re:IRI/URI Canonicalization does not address IRIs with IDNs



Eric said:
[[
  I've never
implemented Unicode normalization, but I expect it's not trivial.
]]

You can use a third party library, e.g. IBM's icu library.

It is then quite straightforward.

The icu4j library is fairly large, and I believe larger than needed for
this problem (since it solves most other I18N problems too). But, I
don't think this is really a problem in practice.

I would urge the group to specify the right thing, whatever that is, and
not be too concerned about the detail here.

I believe that fairly soon most Web addresses will have IDNs.

Jeremy

Received on Friday, 15 February 2008 13:46:38 UTC