- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Mon, 9 Jul 2007 00:05:44 -0400
- To: T.Heath <T.Heath@open.ac.uk>
- Cc: <semantic-web@w3.org>, <rdfapi-php-interest@lists.sourceforge.net>, <jena-dev@groups.yahoo.com>
- Message-Id: <648BAD39-CEEC-4C7A-9098-0551B2FF505F@gmail.com>
I think it's a bug in the implementations. I base this on http:// www.w3.org/TR/rdf-concepts/#section-Graph-URIref > 6.4 RDF URI References > A URI reference within an RDF graph (an RDF URI reference) is a > Unicode string [UNICODE] that: > > does not contain any control characters ( #x00 - #x1F, #x7F-#x9F) > and would produce a valid URI character sequence (per RFC2396 > [URI], sections 2.1) representing an absolute URI with optional > fragment identifier when subjected to the encoding described below. > The encoding consists of: > > encoding the Unicode string as UTF-8 [RFC-2279], giving a sequence > of octet values. > %-escaping octets that do not correspond to permitted US-ASCII > characters. > The disallowed octets that must be %-escaped include all those that > do not correspond to US-ASCII characters, and the excluded > characters listed in Section 2.4 of [URI], except for the number > sign (#), percent sign (%), and the square bracket characters re- > allowed in [RFC-2732]. > > Disallowed octets must be escaped with the URI escaping mechanism > (that is, converted to %HH, where HH is the 2-digit hexadecimal > numeral corresponding to the octet value). > > Two RDF URI references are equal if and only if they compare as > equal, character by character, as Unicode strings. -Alan Jul 8, 2007, at 9:36 AM, T.Heath wrote: > > Hi all, > > I've come across an issue with SPARQL queries over graphs in which > URIs > vary in their use of %-encoding, and hope members of this list may be > able to help out... > > Imagine you have two RDF graphs that reference the same URIs, except > that in one graph special characters in the URIs are %-encoded, and in > the second they are not. For example: > > <http://some.example/example,first> in graph1 vs. > <http://some.example/example%2Cfirst> in graph2 > > As far as I understand it (although I may be wrong) both these URIs > are > the "same", despite their different syntactic form. However, when > performing SPARQL queries over the merge of the two graphs these two > URIs are not treated as the same, therefore making joins of the data > impossible (without pre-processing). I noticed this behaviour first in > RAP, but we've been able to replicate the effect in Jena also. > > So, my question is: is this a bug in RAP, Jena, and presumably other > frameworks, or are there cases in which this is actually the desired > behaviour (i.e. it's a feature not a bug)? If the latter is true, does > this suggest that as a community we need a convention that we will > always mint and use URIs in which specialchars are %-encoded (or the > other way around) in order to avoid this kind of situation? > > Any thoughts/pointers/enlightenment much appreciated, > > Cheers, > > Tom. > > P.S. FWIW the Dbpedia community has recently settled on always using > %-encoded URIs. > > -- > Tom Heath > PhD Student > Knowledge Media Institute > The Open University > Walton Hall > Milton Keynes > MK7 6AA > United Kingdom > > Tel: +44 (0)1908 653565 > Fax: +44 (0)1908 653169 > Web/URI: http://kmi.open.ac.uk/people/tom/ > Jabber: t.heath%open.ac.uk@buddyspace.org >
Received on Monday, 9 July 2007 04:05:52 UTC