W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

Re: IRI and IDN in RDF

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Tue, 11 Jan 2005 16:20:10 +0000
Message-ID: <41E3FCBA.5010508@hplb.hpl.hp.com>
To: Martin Duerst <duerst@w3.org>
CC: "Krall, Gary" <gkrall@verisign.com>, "'Chris Lilley'" <chris@w3.org>, Reto Bachmann-Gmuer <reto@gmuer.ch>, www-international@w3.org

I suspect that formally correct treatment and good practice diverge on this.

The text of RDF Concepts and Abstract Syntax which expresses RDF's idea 
of an IRI is this:
A URI reference within an RDF graph (an RDF URI reference) is a Unicode 
string [UNICODE] that:

     * does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
     * and would produce a valid URI character sequence (per RFC2396 
[URI], sections 2.1) representing an absolute URI with optional fragment 
identifier when subjected to the encoding described below.

The encoding consists of:

    1. encoding the Unicode string as UTF-8 [RFC-2279], giving a 
sequence of octet values.
    2. %-escaping octets that do not correspond to permitted US-ASCII 

The disallowed octets that must be %-escaped include all those that do 
not correspond to US-ASCII characters, and the excluded characters 
listed in Section 2.4 of [URI], except for the number sign (#), percent 
sign (%), and the square bracket characters re-allowed in [RFC-2732].

Disallowed octets must be escaped with the URI escaping mechanism (that 
is, converted to %HH, where HH is the 2-digit hexadecimal numeral 
corresponding to the octet value).

my understanding is that the IDNs are not covered as such by the 
conversion stated, and any additional conversion required from them 
would be an extra (non-standard) feature.

I guess from a W3C side we should be revving a lot of text in light of 
IRI and IDN as they come to fruition.

e.g. the conversion in
    http://r&#xE9;sum&#xE9;.example.org may be converted to
    http://xn--rsum-bpad.example.org instead of
and in fact, the first string is not an 'RDF URI reference' or an XLink 
href attribute value, or in the lexical space of xsd:anyURI, because
is not a legal URI, and the provisions of those specs only allow for 

So, I think an RSS or RDF tool would need either to:
- not check that the URIs were legal
- to have an extended check that knew something about IDNs

and neither is particularly conformant. Personally I would prefer the 


Martin Duerst wrote:
> At 03:45 05/01/08, Krall, Gary wrote:
>  >
>  >Chris:
>  >
>  >Just for clarification does your answer imply that an RSS reader would 
> need
>  >to support IDNA to make this work?
> If it does resolve IRIs, which I guess most RSS readers do, then yes.
> Otherwise no. RDF as such does not require resolution, and therefore
> does not require IDNA support.
> IDNA support is available in libraries (e.g. libidn or idnkit)
> that can easily be integrated into other software (but are a bit
> bulky because of the tables needed).
> Regards,    Martin.
>  >Thanks,
>  >
>  >Gary.
>  >
>  >-----Original Message-----
>  >From: www-international-request@w3.org
>  >[mailto:www-international-request@w3.org]On Behalf Of Chris Lilley
>  >Sent: Friday, January 07, 2005 10:36 AM
>  >To: Reto Bachmann-Gmuer
>  >Cc: www-international@w3.org
>  >Subject: Re: IRI and IDN in RDF
>  >
>  >
>  >
>  >On Friday, January 7, 2005, 7:23:22 PM, Reto wrote:
>  >
>  >
>  >RBG> Hello
>  >
>  >RBG> I'm wondering how URLs based on IDN should be represented in 
>  >RBG> - no particular encoding (= default of xml document)
>  >RBG> - %... encoding
>  >RBG> - punycode
>  >
>  >Since it is XML, the IRI can be expressed in regular characters (the
>  >encoding of the document) and conversion to punycode, hex escaping etc
>  >can be left to the URI resolver.
>  >
>  >
>  >--
>  > Chris Lilley                    mailto:chris@w3.org
>  > Chair, W3C SVG Working Group
>  > Member, W3C Technical Architecture Group
Received on Tuesday, 11 January 2005 16:20:38 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:50 UTC