%-Encoded (and Non-%-Encoded) URIs in SPARQL Queries

Hi all,

I've come across an issue with SPARQL queries over graphs in which URIs
vary in their use of %-encoding, and hope members of this list may be
able to help out...

Imagine you have two RDF graphs that reference the same URIs, except
that in one graph special characters in the URIs are %-encoded, and in
the second they are not. For example:

<http://some.example/example,first> in graph1 vs.
<http://some.example/example%2Cfirst> in graph2

As far as I understand it (although I may be wrong) both these URIs are
the "same", despite their different syntactic form. However, when
performing SPARQL queries over the merge of the two graphs these two
URIs are not treated as the same, therefore making joins of the data
impossible (without pre-processing). I noticed this behaviour first in
RAP, but we've been able to replicate the effect in Jena also.

So, my question is: is this a bug in RAP, Jena, and presumably other
frameworks, or are there cases in which this is actually the desired
behaviour (i.e. it's a feature not a bug)? If the latter is true, does
this suggest that as a community we need a convention that we will
always mint and use URIs in which specialchars are %-encoded (or the
other way around) in order to avoid this kind of situation?

Any thoughts/pointers/enlightenment much appreciated,

Cheers,

Tom.

P.S. FWIW the Dbpedia community has recently settled on always using
%-encoded URIs.

-- 
Tom Heath
PhD Student
Knowledge Media Institute
The Open University
Walton Hall
Milton Keynes
MK7 6AA
United Kingdom

Tel: +44 (0)1908 653565
Fax: +44 (0)1908 653169
Web/URI: http://kmi.open.ac.uk/people/tom/
Jabber: t.heath%open.ac.uk@buddyspace.org  

Received on Sunday, 8 July 2007 13:36:28 UTC