- From: <hong.sun@agfa.com>
- Date: Tue, 11 Oct 2011 14:30:46 +0200
- To: public-rdf-dawg-comments@w3.org
- Cc: jos.deroo@agfa.com
- Message-ID: <OF6F8A5CDC.7C3824A3-ONC1257926.004485A8-C1257926.0044BC0E@agfa.com>
Dear All, I have been testing the OPTIONAL function of SPARQL and discovered that some optional queries are performed differently among Virtuoso, Rasqal and ARQ 2.8.8 . The problem is mainly caused by the left associative characteristic of OPTIONAL, I will try to specify my problems in the following. I used two public endpoints to test Virtuoso and Rasqal: DBPedia (for Virtuoso): http://dbpedia.org/sparql RedLand (for Rasqal): http://librdf.org/query ARQ 2.8.8 is downloaded from SourceForge: https://sourceforge.net/projects/jena/files/ARQ/ The RDF graph I queried on in Redland and ARQ 2.8.8 is from http://jena.sourceforge.net/ARQ/Tutorial/vc-db-3.ttl @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> . _:a a foaf:Person ; foaf:name "Matt Jones" . _:b a foaf:Person ; foaf:name "Sarah Jones" . _:c a foaf:Person ; vcard:FN "Becky Smith" . _:d a foaf:Person ; vcard:FN "John Smith" . SPARQL Query1: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT * WHERE { optional {?x vcard:FN ?name .} optional {?x foaf:nick ?nick .} } Results from RedLand is: --------------------------------------------------------------------------------------------- | count |x | name | nick | ===================================================== | 1 | blank node r1317995959r5965r4 | "John Smith" | | | 2 | blank node r1317995959r5965r3 | "Becky Smith" | | --------------------------------------------------------------------------------------------- Results from ARQ is: ------------------------------- | x | name | nick | =================== | _:b0 | "John Smith" | | | _:b1 | "Becky Smith" | | ------------------------------- But if I switch the order of the optional statements, make the first one not bound, then the two applications delivers different results. SPARQL Query2: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT * WHERE { optional {?x foaf:nick ?nick .} optional {?x vcard:FN ?name .} } Results from RedLand is empty : Found 0 results Results from ARQ is: ------------------------------- | x | nick | name | ==================== | _:b0 | | "John Smith" | | _:b1 | | "Becky Smith" | ------------------------------- This is because if the first optional statement is not bound, the empty binding is passed to the second statement in RedLand endpoint; while in ARQ, the variables in the second statement is free to bind if the first optional is not bound. The DBPedia endpoint behaves similarly to the RedLand endpoint. As the DBPediaendpoint do not allow to retrieve remote RDF, I used their default RDF graph. The Query I used is: Query 1: select * where { optional {?person1 <http://xmlns.com/foaf/0.1/name> ?name1.} optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.} } limit 5 Query 2: select * where { optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.} optional {?person1 <http://xmlns.com/foaf/0.1/name> ?name1.} } limit 5 There is no triple containing property <http://xmlns.com/foaf/0.1/knows>, so that Query 1 returns result: person1 name1 person2 http://dbpedia.org/resource/Pancreatic_cancer "Pancreas Cancer"@en http://dbpedia.org/resource/Hondamatic "H2"@en http://dbpedia.org/resource/RNA_%28journal%29 "RNA"@en http://dbpedia.org/resource/August_30th_%28song%29 "August 30th"@en http://dbpedia.org/resource/Sculpture_%28magazine%29 "Sculpture"@en But Query 2 does not return any result. I understand the above mentioned issue could be solved by stating a non optional statement to define the binding scope of the variables appeared in the optional block. Like stating “?x a foaf:Person.” before the optional statements, but what shall we do if the graph we are going to query on is like below (vc-db-incomplete.ttl): _:a a foaf:Person ; foaf:name "Matt Jones" . _:b foaf:name "Sarah Jones" . _:c a foaf:Person ; vcard:FN "Becky Smith" . _:d vcard:FN "John Smith" . In case a user writes query like what stated in Query 2, then shall we guarantee that different SPARQL applications adhere to a same standard to deliver same interpretation? To my opinion, this is crucial if we want to link open data together, but how can we achieve this? Any suggestion? Many thanks in advance! PS, according to the article “Semantic and Complexity of SPARQL”, using OPTIONAL without any restriction may create PSPACE complexity. It suggests using well designed patterns for OPTIONAL to eliminate the troubles brought by using OPTIONAL incorrectly. But how should our SPARQL endpoints react to those not-well-designed-queries? Answers to this question are hard to make to my humble opinion but have big impact to RDB to RDF mapping because: If we intend to support such queries strictly following the SPARQL Specification adhering to the left-associative property; then in many cases, users do not meant to have those queries interpreted in the right ways according to the specification, they just want ‘intuitive optional’. In addition, the translated SQL query would also become extremely complex and hard to optimize due to the complexities brought by optional. Then can we say we do not support those queries? Even though, there are some rare cases we need such not-well-designed-queries, like the case querying the incomplete database I just mentioned. In addition, even if we decide to reject not-well-designed-queries, it is also difficulty to define what kinds of queries are not-well-designed-queries. I am quite lost in what actions we should take now? Anyone could help? Thanks in advance! Kind Regards, Hong Kind Regards, Hong Sun | Agfa HealthCare Researcher | HE/Advanced Clinical Applications Research T +32 3 444 8108 | F +32 3 444 8401 Agfa HealthCare NV, Moutstraat 100, 9000 Gent, Belgium http://www.agfahealthcare.com Click on link to read important disclaimer: http://www.agfahealthcare.com/maildisclaimer
Received on Tuesday, 18 October 2011 08:10:18 UTC