Re: IRI and IDN in RDF from Jeremy Carroll on 2005-01-13 (www-international@w3.org from January to March 2005)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 13 Jan 2005 11:55:31 +0000
To: "Krall, Gary" <gkrall@verisign.com>
CC: Martin Duerst <duerst@w3.org>, "'Chris Lilley'" <chris@w3.org>, Reto Bachmann-Gmuer <reto@gmuer.ch>, www-international@w3.org
Message-ID: <41E661B3.5050407@hplb.hpl.hp.com>
Krall, Gary wrote:
> Jeremy:
> 
> In your latter suggestion this would imply that IDN encoding/decoding is
> occurring within the client application correct?
> 
> Gary.

My thoughts as to who does this checking are heavily influenced by my 
work on the Jena Semantic Web Framework.

http://jena.sourceforge.net/

We have found that an overly tolerant approach to bad URIs causes 
hard-to-support problems on output, and that it is best to check all 
URIs for well-formedness on input. Jena is used in both server and 
client side SW applications.

However, Jena, in keeping with the RDF Concepts wording, does, only a 
hypothetical check, that the IRIs can be converted to ASCII URIs, but we 
don't actually perform the conversion (hmmmm... when processing 
owl:imports the URIs have to be used as URLs, I bet we don't get that 
right, I'll add it to our to-do list)

Anyone know of an RDF file on a server with an IDN?



Jeremy


> 
> -----Original Message-----
> From: Jeremy Carroll [mailto:jjc@hplb.hpl.hp.com]
> Sent: Tuesday, January 11, 2005 8:20 AM
> To: Martin Duerst
> Cc: Krall, Gary; 'Chris Lilley'; Reto Bachmann-Gmuer;
> www-international@w3.org
> Subject: Re: IRI and IDN in RDF
> 
> 
> 
> 
> I suspect that formally correct treatment and good practice diverge on this.
> 
> The text of RDF Concepts and Abstract Syntax which expresses RDF's idea 
> of an IRI is this:
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref
> [[
> A URI reference within an RDF graph (an RDF URI reference) is a Unicode 
> string [UNICODE] that:
> 
>      * does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
>      * and would produce a valid URI character sequence (per RFC2396 
> [URI], sections 2.1) representing an absolute URI with optional fragment 
> identifier when subjected to the encoding described below.
> 
> The encoding consists of:
> 
>     1. encoding the Unicode string as UTF-8 [RFC-2279], giving a 
> sequence of octet values.
>     2. %-escaping octets that do not correspond to permitted US-ASCII 
> characters.
> 
> The disallowed octets that must be %-escaped include all those that do 
> not correspond to US-ASCII characters, and the excluded characters 
> listed in Section 2.4 of [URI], except for the number sign (#), percent 
> sign (%), and the square bracket characters re-allowed in [RFC-2732].
> 
> Disallowed octets must be escaped with the URI escaping mechanism (that 
> is, converted to %HH, where HH is the 2-digit hexadecimal numeral 
> corresponding to the octet value).
> ]]
> 
> my understanding is that the IDNs are not covered as such by the 
> conversion stated, and any additional conversion required from them 
> would be an extra (non-standard) feature.
> 
> I guess from a W3C side we should be revving a lot of text in light of 
> IRI and IDN as they come to fruition.
> 
> e.g. the conversion in
> http://www.w3.org/International/iri-edit/draft-duerst-iri-10.txt
> [[
>     http://r&#xE9;sum&#xE9;.example.org may be converted to
>     http://xn--rsum-bpad.example.org instead of
>     http://r%C3%A9sum%C3%A9.example.org.
> ]]
> and in fact, the first string is not an 'RDF URI reference' or an XLink 
> href attribute value, or in the lexical space of xsd:anyURI, because
>     http://r%C3%A9sum%C3%A9.example.org.
> is not a legal URI, and the provisions of those specs only allow for 
> %-encoding.
> 
> So, I think an RSS or RDF tool would need either to:
> - not check that the URIs were legal
> or
> - to have an extended check that knew something about IDNs
> 
> and neither is particularly conformant. Personally I would prefer the 
> latter.
> 
> 
> Jeremy
> 
> Martin Duerst wrote:
> 
>>At 03:45 05/01/08, Krall, Gary wrote:
>> >
>> >Chris:
>> >
>> >Just for clarification does your answer imply that an RSS reader would 
>>need
>> >to support IDNA to make this work?
>>
>>If it does resolve IRIs, which I guess most RSS readers do, then yes.
>>Otherwise no. RDF as such does not require resolution, and therefore
>>does not require IDNA support.
>>
>>IDNA support is available in libraries (e.g. libidn or idnkit)
>>that can easily be integrated into other software (but are a bit
>>bulky because of the tables needed).
>>
>>Regards,    Martin.
>>
>> >Thanks,
>> >
>> >Gary.
>> >
>> >-----Original Message-----
>> >From: www-international-request@w3.org
>> >[mailto:www-international-request@w3.org]On Behalf Of Chris Lilley
>> >Sent: Friday, January 07, 2005 10:36 AM
>> >To: Reto Bachmann-Gmuer
>> >Cc: www-international@w3.org
>> >Subject: Re: IRI and IDN in RDF
>> >
>> >
>> >
>> >On Friday, January 7, 2005, 7:23:22 PM, Reto wrote:
>> >
>> >
>> >RBG> Hello
>> >
>> >RBG> I'm wondering how URLs based on IDN should be represented in 
>>RDF/XML:
>> >RBG> - no particular encoding (= default of xml document)
>> >RBG> - %... encoding
>> >RBG> - punycode
>> >
>> >Since it is XML, the IRI can be expressed in regular characters (the
>> >encoding of the document) and conversion to punycode, hex escaping etc
>> >can be left to the URI resolver.
>> >
>> >
>> >--
>> > Chris Lilley                    mailto:chris@w3.org
>> > Chair, W3C SVG Working Group
>> > Member, W3C Technical Architecture Group
>>
> 
>
Received on Thursday, 13 January 2005 11:56:00 UTC