- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Thu, 13 Jan 2005 11:55:31 +0000
- To: "Krall, Gary" <gkrall@verisign.com>
- CC: Martin Duerst <duerst@w3.org>, "'Chris Lilley'" <chris@w3.org>, Reto Bachmann-Gmuer <reto@gmuer.ch>, www-international@w3.org
Krall, Gary wrote: > Jeremy: > > In your latter suggestion this would imply that IDN encoding/decoding is > occurring within the client application correct? > > Gary. My thoughts as to who does this checking are heavily influenced by my work on the Jena Semantic Web Framework. http://jena.sourceforge.net/ We have found that an overly tolerant approach to bad URIs causes hard-to-support problems on output, and that it is best to check all URIs for well-formedness on input. Jena is used in both server and client side SW applications. However, Jena, in keeping with the RDF Concepts wording, does, only a hypothetical check, that the IRIs can be converted to ASCII URIs, but we don't actually perform the conversion (hmmmm... when processing owl:imports the URIs have to be used as URLs, I bet we don't get that right, I'll add it to our to-do list) Anyone know of an RDF file on a server with an IDN? Jeremy > > -----Original Message----- > From: Jeremy Carroll [mailto:jjc@hplb.hpl.hp.com] > Sent: Tuesday, January 11, 2005 8:20 AM > To: Martin Duerst > Cc: Krall, Gary; 'Chris Lilley'; Reto Bachmann-Gmuer; > www-international@w3.org > Subject: Re: IRI and IDN in RDF > > > > > I suspect that formally correct treatment and good practice diverge on this. > > The text of RDF Concepts and Abstract Syntax which expresses RDF's idea > of an IRI is this: > http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref > [[ > A URI reference within an RDF graph (an RDF URI reference) is a Unicode > string [UNICODE] that: > > * does not contain any control characters ( #x00 - #x1F, #x7F-#x9F) > * and would produce a valid URI character sequence (per RFC2396 > [URI], sections 2.1) representing an absolute URI with optional fragment > identifier when subjected to the encoding described below. > > The encoding consists of: > > 1. encoding the Unicode string as UTF-8 [RFC-2279], giving a > sequence of octet values. > 2. %-escaping octets that do not correspond to permitted US-ASCII > characters. > > The disallowed octets that must be %-escaped include all those that do > not correspond to US-ASCII characters, and the excluded characters > listed in Section 2.4 of [URI], except for the number sign (#), percent > sign (%), and the square bracket characters re-allowed in [RFC-2732]. > > Disallowed octets must be escaped with the URI escaping mechanism (that > is, converted to %HH, where HH is the 2-digit hexadecimal numeral > corresponding to the octet value). > ]] > > my understanding is that the IDNs are not covered as such by the > conversion stated, and any additional conversion required from them > would be an extra (non-standard) feature. > > I guess from a W3C side we should be revving a lot of text in light of > IRI and IDN as they come to fruition. > > e.g. the conversion in > http://www.w3.org/International/iri-edit/draft-duerst-iri-10.txt > [[ > http://résumé.example.org may be converted to > http://xn--rsum-bpad.example.org instead of > http://r%C3%A9sum%C3%A9.example.org. > ]] > and in fact, the first string is not an 'RDF URI reference' or an XLink > href attribute value, or in the lexical space of xsd:anyURI, because > http://r%C3%A9sum%C3%A9.example.org. > is not a legal URI, and the provisions of those specs only allow for > %-encoding. > > So, I think an RSS or RDF tool would need either to: > - not check that the URIs were legal > or > - to have an extended check that knew something about IDNs > > and neither is particularly conformant. Personally I would prefer the > latter. > > > Jeremy > > Martin Duerst wrote: > >>At 03:45 05/01/08, Krall, Gary wrote: >> > >> >Chris: >> > >> >Just for clarification does your answer imply that an RSS reader would >>need >> >to support IDNA to make this work? >> >>If it does resolve IRIs, which I guess most RSS readers do, then yes. >>Otherwise no. RDF as such does not require resolution, and therefore >>does not require IDNA support. >> >>IDNA support is available in libraries (e.g. libidn or idnkit) >>that can easily be integrated into other software (but are a bit >>bulky because of the tables needed). >> >>Regards, Martin. >> >> >Thanks, >> > >> >Gary. >> > >> >-----Original Message----- >> >From: www-international-request@w3.org >> >[mailto:www-international-request@w3.org]On Behalf Of Chris Lilley >> >Sent: Friday, January 07, 2005 10:36 AM >> >To: Reto Bachmann-Gmuer >> >Cc: www-international@w3.org >> >Subject: Re: IRI and IDN in RDF >> > >> > >> > >> >On Friday, January 7, 2005, 7:23:22 PM, Reto wrote: >> > >> > >> >RBG> Hello >> > >> >RBG> I'm wondering how URLs based on IDN should be represented in >>RDF/XML: >> >RBG> - no particular encoding (= default of xml document) >> >RBG> - %... encoding >> >RBG> - punycode >> > >> >Since it is XML, the IRI can be expressed in regular characters (the >> >encoding of the document) and conversion to punycode, hex escaping etc >> >can be left to the URI resolver. >> > >> > >> >-- >> > Chris Lilley mailto:chris@w3.org >> > Chair, W3C SVG Working Group >> > Member, W3C Technical Architecture Group >> > >
Received on Thursday, 13 January 2005 11:56:00 UTC