- From: T.Heath <T.Heath@open.ac.uk>
- Date: Sun, 8 Jul 2007 14:36:11 +0100
- To: <semantic-web@w3.org>
- Cc: <rdfapi-php-interest@lists.sourceforge.net>, <jena-dev@groups.yahoo.com>
Hi all, I've come across an issue with SPARQL queries over graphs in which URIs vary in their use of %-encoding, and hope members of this list may be able to help out... Imagine you have two RDF graphs that reference the same URIs, except that in one graph special characters in the URIs are %-encoded, and in the second they are not. For example: <http://some.example/example,first> in graph1 vs. <http://some.example/example%2Cfirst> in graph2 As far as I understand it (although I may be wrong) both these URIs are the "same", despite their different syntactic form. However, when performing SPARQL queries over the merge of the two graphs these two URIs are not treated as the same, therefore making joins of the data impossible (without pre-processing). I noticed this behaviour first in RAP, but we've been able to replicate the effect in Jena also. So, my question is: is this a bug in RAP, Jena, and presumably other frameworks, or are there cases in which this is actually the desired behaviour (i.e. it's a feature not a bug)? If the latter is true, does this suggest that as a community we need a convention that we will always mint and use URIs in which specialchars are %-encoded (or the other way around) in order to avoid this kind of situation? Any thoughts/pointers/enlightenment much appreciated, Cheers, Tom. P.S. FWIW the Dbpedia community has recently settled on always using %-encoded URIs. -- Tom Heath PhD Student Knowledge Media Institute The Open University Walton Hall Milton Keynes MK7 6AA United Kingdom Tel: +44 (0)1908 653565 Fax: +44 (0)1908 653169 Web/URI: http://kmi.open.ac.uk/people/tom/ Jabber: t.heath%open.ac.uk@buddyspace.org
Received on Sunday, 8 July 2007 13:36:28 UTC