Re: IRI and IDN in RDF from Jeremy Carroll on 2005-01-11 (www-international@w3.org from January to March 2005)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Tue, 11 Jan 2005 16:20:10 +0000
To: Martin Duerst <duerst@w3.org>
CC: "Krall, Gary" <gkrall@verisign.com>, "'Chris Lilley'" <chris@w3.org>, Reto Bachmann-Gmuer <reto@gmuer.ch>, www-international@w3.org
Message-ID: <41E3FCBA.5010508@hplb.hpl.hp.com>

I suspect that formally correct treatment and good practice diverge on this.

The text of RDF Concepts and Abstract Syntax which expresses RDF's idea 
of an IRI is this:
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref
[[
A URI reference within an RDF graph (an RDF URI reference) is a Unicode 
string [UNICODE] that:

     * does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
     * and would produce a valid URI character sequence (per RFC2396 
[URI], sections 2.1) representing an absolute URI with optional fragment 
identifier when subjected to the encoding described below.

The encoding consists of:

    1. encoding the Unicode string as UTF-8 [RFC-2279], giving a 
sequence of octet values.
    2. %-escaping octets that do not correspond to permitted US-ASCII 
characters.

The disallowed octets that must be %-escaped include all those that do 
not correspond to US-ASCII characters, and the excluded characters 
listed in Section 2.4 of [URI], except for the number sign (#), percent 
sign (%), and the square bracket characters re-allowed in [RFC-2732].

Disallowed octets must be escaped with the URI escaping mechanism (that 
is, converted to %HH, where HH is the 2-digit hexadecimal numeral 
corresponding to the octet value).
]]

my understanding is that the IDNs are not covered as such by the 
conversion stated, and any additional conversion required from them 
would be an extra (non-standard) feature.

I guess from a W3C side we should be revving a lot of text in light of 
IRI and IDN as they come to fruition.

e.g. the conversion in
http://www.w3.org/International/iri-edit/draft-duerst-iri-10.txt
[[
    http://r&#xE9;sum&#xE9;.example.org may be converted to
    http://xn--rsum-bpad.example.org instead of
    http://r%C3%A9sum%C3%A9.example.org.
]]
and in fact, the first string is not an 'RDF URI reference' or an XLink 
href attribute value, or in the lexical space of xsd:anyURI, because
    http://r%C3%A9sum%C3%A9.example.org.
is not a legal URI, and the provisions of those specs only allow for 
%-encoding.

So, I think an RSS or RDF tool would need either to:
- not check that the URIs were legal
or
- to have an extended check that knew something about IDNs

and neither is particularly conformant. Personally I would prefer the 
latter.


Jeremy

Martin Duerst wrote:
> 
> At 03:45 05/01/08, Krall, Gary wrote:
>  >
>  >Chris:
>  >
>  >Just for clarification does your answer imply that an RSS reader would 
> need
>  >to support IDNA to make this work?
> 
> If it does resolve IRIs, which I guess most RSS readers do, then yes.
> Otherwise no. RDF as such does not require resolution, and therefore
> does not require IDNA support.
> 
> IDNA support is available in libraries (e.g. libidn or idnkit)
> that can easily be integrated into other software (but are a bit
> bulky because of the tables needed).
> 
> Regards,    Martin.
> 
>  >Thanks,
>  >
>  >Gary.
>  >
>  >-----Original Message-----
>  >From: www-international-request@w3.org
>  >[mailto:www-international-request@w3.org]On Behalf Of Chris Lilley
>  >Sent: Friday, January 07, 2005 10:36 AM
>  >To: Reto Bachmann-Gmuer
>  >Cc: www-international@w3.org
>  >Subject: Re: IRI and IDN in RDF
>  >
>  >
>  >
>  >On Friday, January 7, 2005, 7:23:22 PM, Reto wrote:
>  >
>  >
>  >RBG> Hello
>  >
>  >RBG> I'm wondering how URLs based on IDN should be represented in 
> RDF/XML:
>  >RBG> - no particular encoding (= default of xml document)
>  >RBG> - %... encoding
>  >RBG> - punycode
>  >
>  >Since it is XML, the IRI can be expressed in regular characters (the
>  >encoding of the document) and conversion to punycode, hex escaping etc
>  >can be left to the URI resolver.
>  >
>  >
>  >--
>  > Chris Lilley                    mailto:chris@w3.org
>  > Chair, W3C SVG Working Group
>  > Member, W3C Technical Architecture Group
>

Received on Tuesday, 11 January 2005 16:20:38 UTC