W3C home > Mailing lists > Public > semantic-web@w3.org > July 2007

Re: %-Encoded (and Non-%-Encoded) URIs in SPARQL Queries

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Mon, 9 Jul 2007 00:05:44 -0400
Message-Id: <648BAD39-CEEC-4C7A-9098-0551B2FF505F@gmail.com>
Cc: <semantic-web@w3.org>, <rdfapi-php-interest@lists.sourceforge.net>, <jena-dev@groups.yahoo.com>
To: T.Heath <T.Heath@open.ac.uk>
I think it's a bug in the implementations.  I base this on http:// 

>  6.4 RDF URI References
> A URI reference within an RDF graph (an RDF URI reference) is a  
> Unicode string [UNICODE] that:
> does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
> and would produce a valid URI character sequence (per RFC2396  
> [URI], sections 2.1) representing an absolute URI with optional  
> fragment identifier when subjected to the encoding described below.
> The encoding consists of:
> encoding the Unicode string as UTF-8 [RFC-2279], giving a sequence  
> of octet values.
> %-escaping octets that do not correspond to permitted US-ASCII  
> characters.
> The disallowed octets that must be %-escaped include all those that  
> do not correspond to US-ASCII characters, and the excluded  
> characters listed in Section 2.4 of [URI], except for the number  
> sign (#), percent sign (%), and the square bracket characters re- 
> allowed in [RFC-2732].
> Disallowed octets must be escaped with the URI escaping mechanism  
> (that is, converted to %HH, where HH is the 2-digit hexadecimal  
> numeral corresponding to the octet value).
> Two RDF URI references are equal if and only if they compare as  
> equal, character by character, as Unicode strings.


Jul 8, 2007, at 9:36 AM, T.Heath wrote:

> Hi all,
> I've come across an issue with SPARQL queries over graphs in which  
> URIs
> vary in their use of %-encoding, and hope members of this list may be
> able to help out...
> Imagine you have two RDF graphs that reference the same URIs, except
> that in one graph special characters in the URIs are %-encoded, and in
> the second they are not. For example:
> <http://some.example/example,first> in graph1 vs.
> <http://some.example/example%2Cfirst> in graph2
> As far as I understand it (although I may be wrong) both these URIs  
> are
> the "same", despite their different syntactic form. However, when
> performing SPARQL queries over the merge of the two graphs these two
> URIs are not treated as the same, therefore making joins of the data
> impossible (without pre-processing). I noticed this behaviour first in
> RAP, but we've been able to replicate the effect in Jena also.
> So, my question is: is this a bug in RAP, Jena, and presumably other
> frameworks, or are there cases in which this is actually the desired
> behaviour (i.e. it's a feature not a bug)? If the latter is true, does
> this suggest that as a community we need a convention that we will
> always mint and use URIs in which specialchars are %-encoded (or the
> other way around) in order to avoid this kind of situation?
> Any thoughts/pointers/enlightenment much appreciated,
> Cheers,
> Tom.
> P.S. FWIW the Dbpedia community has recently settled on always using
> %-encoded URIs.
> -- 
> Tom Heath
> PhD Student
> Knowledge Media Institute
> The Open University
> Walton Hall
> Milton Keynes
> MK7 6AA
> United Kingdom
> Tel: +44 (0)1908 653565
> Fax: +44 (0)1908 653169
> Web/URI: http://kmi.open.ac.uk/people/tom/
> Jabber: t.heath%open.ac.uk@buddyspace.org

Received on Monday, 9 July 2007 04:05:52 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:01 UTC