- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Tue, 11 Jan 2005 16:20:10 +0000
- To: Martin Duerst <duerst@w3.org>
- CC: "Krall, Gary" <gkrall@verisign.com>, "'Chris Lilley'" <chris@w3.org>, Reto Bachmann-Gmuer <reto@gmuer.ch>, www-international@w3.org
I suspect that formally correct treatment and good practice diverge on this.
The text of RDF Concepts and Abstract Syntax which expresses RDF's idea
of an IRI is this:
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref
[[
A URI reference within an RDF graph (an RDF URI reference) is a Unicode
string [UNICODE] that:
* does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
* and would produce a valid URI character sequence (per RFC2396
[URI], sections 2.1) representing an absolute URI with optional fragment
identifier when subjected to the encoding described below.
The encoding consists of:
1. encoding the Unicode string as UTF-8 [RFC-2279], giving a
sequence of octet values.
2. %-escaping octets that do not correspond to permitted US-ASCII
characters.
The disallowed octets that must be %-escaped include all those that do
not correspond to US-ASCII characters, and the excluded characters
listed in Section 2.4 of [URI], except for the number sign (#), percent
sign (%), and the square bracket characters re-allowed in [RFC-2732].
Disallowed octets must be escaped with the URI escaping mechanism (that
is, converted to %HH, where HH is the 2-digit hexadecimal numeral
corresponding to the octet value).
]]
my understanding is that the IDNs are not covered as such by the
conversion stated, and any additional conversion required from them
would be an extra (non-standard) feature.
I guess from a W3C side we should be revving a lot of text in light of
IRI and IDN as they come to fruition.
e.g. the conversion in
http://www.w3.org/International/iri-edit/draft-duerst-iri-10.txt
[[
http://résumé.example.org may be converted to
http://xn--rsum-bpad.example.org instead of
http://r%C3%A9sum%C3%A9.example.org.
]]
and in fact, the first string is not an 'RDF URI reference' or an XLink
href attribute value, or in the lexical space of xsd:anyURI, because
http://r%C3%A9sum%C3%A9.example.org.
is not a legal URI, and the provisions of those specs only allow for
%-encoding.
So, I think an RSS or RDF tool would need either to:
- not check that the URIs were legal
or
- to have an extended check that knew something about IDNs
and neither is particularly conformant. Personally I would prefer the
latter.
Jeremy
Martin Duerst wrote:
>
> At 03:45 05/01/08, Krall, Gary wrote:
> >
> >Chris:
> >
> >Just for clarification does your answer imply that an RSS reader would
> need
> >to support IDNA to make this work?
>
> If it does resolve IRIs, which I guess most RSS readers do, then yes.
> Otherwise no. RDF as such does not require resolution, and therefore
> does not require IDNA support.
>
> IDNA support is available in libraries (e.g. libidn or idnkit)
> that can easily be integrated into other software (but are a bit
> bulky because of the tables needed).
>
> Regards, Martin.
>
> >Thanks,
> >
> >Gary.
> >
> >-----Original Message-----
> >From: www-international-request@w3.org
> >[mailto:www-international-request@w3.org]On Behalf Of Chris Lilley
> >Sent: Friday, January 07, 2005 10:36 AM
> >To: Reto Bachmann-Gmuer
> >Cc: www-international@w3.org
> >Subject: Re: IRI and IDN in RDF
> >
> >
> >
> >On Friday, January 7, 2005, 7:23:22 PM, Reto wrote:
> >
> >
> >RBG> Hello
> >
> >RBG> I'm wondering how URLs based on IDN should be represented in
> RDF/XML:
> >RBG> - no particular encoding (= default of xml document)
> >RBG> - %... encoding
> >RBG> - punycode
> >
> >Since it is XML, the IRI can be expressed in regular characters (the
> >encoding of the document) and conversion to punycode, hex escaping etc
> >can be left to the URI resolver.
> >
> >
> >--
> > Chris Lilley mailto:chris@w3.org
> > Chair, W3C SVG Working Group
> > Member, W3C Technical Architecture Group
>
Received on Tuesday, 11 January 2005 16:20:38 UTC