- From: <hong.sun@agfa.com>
- Date: Tue, 11 Oct 2011 14:30:46 +0200
- To: public-rdf-dawg-comments@w3.org
- Cc: jos.deroo@agfa.com
- Message-ID: <OF6F8A5CDC.7C3824A3-ONC1257926.004485A8-C1257926.0044BC0E@agfa.com>
Dear All,
I have been testing the OPTIONAL function of SPARQL and discovered that
some optional queries are performed differently among Virtuoso, Rasqal and
ARQ 2.8.8 . The problem is mainly caused by the left associative
characteristic of OPTIONAL, I will try to specify my problems in the
following.
I used two public endpoints to test Virtuoso and Rasqal:
DBPedia (for Virtuoso): http://dbpedia.org/sparql
RedLand (for Rasqal): http://librdf.org/query
ARQ 2.8.8 is downloaded from SourceForge:
https://sourceforge.net/projects/jena/files/ARQ/
The RDF graph I queried on in Redland and ARQ 2.8.8 is from
http://jena.sourceforge.net/ARQ/Tutorial/vc-db-3.ttl
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
_:a a foaf:Person ;
foaf:name "Matt Jones" .
_:b a foaf:Person ;
foaf:name "Sarah Jones" .
_:c a foaf:Person ;
vcard:FN "Becky Smith" .
_:d a foaf:Person ;
vcard:FN "John Smith" .
SPARQL Query1:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT *
WHERE {
optional {?x vcard:FN ?name .}
optional {?x foaf:nick ?nick .}
}
Results from RedLand is:
---------------------------------------------------------------------------------------------
| count |x | name
| nick |
=====================================================
| 1 | blank node r1317995959r5965r4 | "John Smith" | |
| 2 | blank node r1317995959r5965r3 | "Becky Smith" | |
---------------------------------------------------------------------------------------------
Results from ARQ is:
-------------------------------
| x | name | nick |
===================
| _:b0 | "John Smith" | |
| _:b1 | "Becky Smith" | |
-------------------------------
But if I switch the order of the optional statements, make the first one
not bound, then the two applications delivers different results.
SPARQL Query2:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT *
WHERE {
optional {?x foaf:nick ?nick .}
optional {?x vcard:FN ?name .}
}
Results from RedLand is empty :
Found 0 results
Results from ARQ is:
-------------------------------
| x | nick | name |
====================
| _:b0 | | "John Smith" |
| _:b1 | | "Becky Smith" |
-------------------------------
This is because if the first optional statement is not bound, the empty
binding is passed to the second statement in RedLand endpoint; while in
ARQ, the variables in the second statement is free to bind if the first
optional is not bound.
The DBPedia endpoint behaves similarly to the RedLand endpoint.
As the DBPediaendpoint do not allow to retrieve remote RDF, I used their
default RDF graph.
The Query I used is:
Query 1:
select * where {
optional {?person1 <http://xmlns.com/foaf/0.1/name> ?name1.}
optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.}
}
limit 5
Query 2:
select * where {
optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.}
optional {?person1 <http://xmlns.com/foaf/0.1/name> ?name1.}
}
limit 5
There is no triple containing property <http://xmlns.com/foaf/0.1/knows>,
so that Query 1 returns result:
person1
name1
person2
http://dbpedia.org/resource/Pancreatic_cancer
"Pancreas Cancer"@en
http://dbpedia.org/resource/Hondamatic
"H2"@en
http://dbpedia.org/resource/RNA_%28journal%29
"RNA"@en
http://dbpedia.org/resource/August_30th_%28song%29
"August 30th"@en
http://dbpedia.org/resource/Sculpture_%28magazine%29
"Sculpture"@en
But Query 2 does not return any result.
I understand the above mentioned issue could be solved by stating a non
optional statement to define the binding scope of the variables appeared
in the optional block. Like stating “?x a foaf:Person.” before the
optional statements, but what shall we do if the graph we are going to
query on is like below (vc-db-incomplete.ttl):
_:a a foaf:Person ;
foaf:name "Matt Jones" .
_:b foaf:name "Sarah Jones" .
_:c a foaf:Person ;
vcard:FN "Becky Smith" .
_:d vcard:FN "John Smith" .
In case a user writes query like what stated in Query 2, then shall we
guarantee that different SPARQL applications adhere to a same standard to
deliver same interpretation? To my opinion, this is crucial if we want to
link open data together, but how can we achieve this? Any suggestion? Many
thanks in advance!
PS, according to the article “Semantic and Complexity of SPARQL”, using
OPTIONAL without any restriction may create PSPACE complexity. It suggests
using well designed patterns for OPTIONAL to eliminate the troubles
brought by using OPTIONAL incorrectly. But how should our SPARQL endpoints
react to those not-well-designed-queries? Answers to this question are
hard to make to my humble opinion but have big impact to RDB to RDF
mapping because:
If we intend to support such queries strictly following the SPARQL
Specification adhering to the left-associative property; then in many
cases, users do not meant to have those queries interpreted in the right
ways according to the specification, they just want ‘intuitive optional’.
In addition, the translated SQL query would also become extremely complex
and hard to optimize due to the complexities brought by optional.
Then can we say we do not support those queries? Even though, there are
some rare cases we need such not-well-designed-queries, like the case
querying the incomplete database I just mentioned. In addition, even if we
decide to reject not-well-designed-queries, it is also difficulty to
define what kinds of queries are not-well-designed-queries.
I am quite lost in what actions we should take now? Anyone could help?
Thanks in advance!
Kind Regards,
Hong
Kind Regards,
Hong Sun | Agfa HealthCare
Researcher | HE/Advanced Clinical Applications Research
T +32 3 444 8108 | F +32 3 444 8401
Agfa HealthCare NV, Moutstraat 100, 9000 Gent, Belgium
http://www.agfahealthcare.com
Click on link to read important disclaimer:
http://www.agfahealthcare.com/maildisclaimer
Received on Tuesday, 18 October 2011 08:10:18 UTC