W3C home > Mailing lists > Public > semantic-web@w3.org > July 2010

Re: RDF 2.0 Wishlist - Legal RDF which I can't SPARQL

From: Mischa Tuffield <mischa.tuffield@garlik.com>
Date: Thu, 29 Jul 2010 15:05:52 +0100
Cc: Semantic Web <semantic-web@w3.org>
Message-Id: <29878BFE-E241-40B9-9F3D-382BD0690185@garlik.com>
To: Damian Steer <pldms@mac.com>
Hello, 

On 29 Jul 2010, at 13:51, Damian Steer wrote:

> 
> On 29 Jul 2010, at 12:20, Mischa Tuffield wrote:
> 
>> Hi All, 
>> 
>> I know this is a known problem, but I have been bitten by the fact that there are legal RDF documents which I can't query using the SPARQL query language. And perhaps this should be looked at any future revision of RDF or SPARQL. 
>> 
>> The issue arises because turtle doesn't forbid the use of certain characters, for example the backtick " ` " (%60), where as SPARQL does forbid it. Which means that I can write legal turtle, import it into my triplestore, but I wont be able to ever query that data via SPARQL.
>> 
>> For example, the following turtle is legal : 
>> 
>> <http://example.com/mylamefoafdocument`uri> a foaf:Document . 
>> <http://example.com/mylamefoafdocument`uri> foaf:primaryTopic  <http://example.com/mylamefoafdocument`uri#me> .
>> 
>> But I cant write the following SPARQL query: 
>> 
>> SELECT * WHERE { <http://example.com/mylamefoafdocument`uri> ?p ?o}
>> 
>> I thought this was due to the fact that the RDF spec [1] was written before the RFC which defined URIs [2], but I can't find a link to an RDF spec which pre dates 1998.
> 
> RDF core was working in parallel with the IRI [1] work. URIRef [2] (as I understand it) was trying to anticipate what IRIs would be. URIRef and IRI are pretty close (but see below), and I think the general recommendation is that you should read 'URIRef' as 'IRI'.
> 
> SPARQL syntax is defined in terms of IRIs, although I'm not sure syntax is identical (it uses ([^<>"{}|^`\]-[#x00-#x20])*), but it seems close enough.
> 
> Looking at the IRI spec ` is not permitted, however URIRef does allow it. 'Pretty close', but not close enough. Turtle, it seems, is in the URIRef camp.

Yeah I follow and thanks for the clarification, but as far as I can tell rdfxml (is a rec) is in the URIRef space too - please correct me if I am wrong. Which means that I still have the same problem of not being to querying for URI which I can import using rdfxml with sparql. 

> 
> It also doesn't seem to be permitted in URIs, [3] which makes URIRef feel like it's outside the mainstream.

Agreed. 

> Personally I would follow IRI and fix turtle. Why should RDF have its own URL/URI/IRI-ish syntax?

Do you think that the same logic should be applied to rdfxml too ? Otherwise there will be things you can write in turtle and not in rdfxml which you can subsequently sparql, which simply doesn't feel right to me. 

I wonder if I should contact the current sparql working group, as they are currently active, and see how they respond. I think it is unfortunate that you can write valid rdf which can't be queried in sparql. 

> 
> As for "http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%20Within%20the%20Dating,%20Adult%20Dating%20Arena.php'", that does work when encoded.

Yupo, I am aware I could just encode the ` as %60. 

> 
> Disclaimer: I may have got some or all of this wrong. Do not trust my assertions regarding the RFCs.
> 
> Damian

Thanks Damian, 

Mischa

> 
> [1] <http://www.ietf.org/rfc/rfc3987.txt>
> [2] <http://www.w3.org/TR/rdf-concepts/#dfn-URI-reference>
> [3] <http://www.ietf.org/rfc/rfc2396.txt>
> 

___________________________________
Mischa Tuffield PhD
Email: mischa.tuffield@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 1-3 Halford Road, Richmond, TW10 6AW
+44(0)845 645 2824  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Thursday, 29 July 2010 14:06:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:42:21 UTC