W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > October 2011

Implementation issues of OPTIONAL

From: <hong.sun@agfa.com>
Date: Tue, 11 Oct 2011 14:30:46 +0200
To: public-rdf-dawg-comments@w3.org
Cc: jos.deroo@agfa.com
Message-ID: <OF6F8A5CDC.7C3824A3-ONC1257926.004485A8-C1257926.0044BC0E@agfa.com>
Dear All,

I have been testing the OPTIONAL function of SPARQL and discovered that 
some optional queries are performed differently among Virtuoso, Rasqal and 
ARQ 2.8.8 . The problem is mainly caused by the left associative 
characteristic of OPTIONAL, I will try to specify my problems in the 
following.

I used two public endpoints to test Virtuoso and Rasqal:
DBPedia (for Virtuoso): http://dbpedia.org/sparql

RedLand (for Rasqal): http://librdf.org/query


ARQ 2.8.8 is downloaded from SourceForge: 
https://sourceforge.net/projects/jena/files/ARQ/


The RDF graph I queried on in Redland and ARQ 2.8.8 is from 
http://jena.sourceforge.net/ARQ/Tutorial/vc-db-3.ttl


@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
@prefix vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#> .

_:a a foaf:Person ;
    foaf:name   "Matt Jones" .

_:b a foaf:Person ;
    foaf:name   "Sarah Jones" .

_:c a foaf:Person ;
    vcard:FN    "Becky Smith" .

_:d a foaf:Person ;
    vcard:FN    "John Smith" .

 
SPARQL Query1:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
prefix vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT *
WHERE { 
optional {?x vcard:FN ?name .}
optional {?x foaf:nick ?nick .} 
}

Results from RedLand is:

---------------------------------------------------------------------------------------------
| count    |x                                                       | name 
            | nick |
=====================================================
| 1          |  blank node r1317995959r5965r4  | "John Smith"    | |
| 2          |  blank node r1317995959r5965r3  | "Becky Smith"  | |
---------------------------------------------------------------------------------------------


Results from ARQ is:

-------------------------------
| x    | name                | nick |
===================
| _:b0 | "John Smith"    |      |
| _:b1 | "Becky Smith" |      |
-------------------------------

But if I switch the order of the optional statements, make the first one 
not bound, then the two applications delivers different results.

SPARQL Query2:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
prefix vcard:      <http://www.w3.org/2001/vcard-rdf/3.0#>
SELECT *
WHERE { 
optional {?x foaf:nick ?nick .} 
optional {?x vcard:FN ?name .}
}

Results from RedLand is empty : 
        Found 0 results
Results from ARQ is:
-------------------------------
| x    | nick | name              |
====================
| _:b0 |      | "John Smith"    |
| _:b1 |      | "Becky Smith" |
-------------------------------

This is because if the first optional statement is not bound, the empty 
binding is passed to the second statement in RedLand endpoint; while in 
ARQ, the variables in the second statement is free to bind if the first 
optional is not bound.

The DBPedia endpoint behaves similarly to the RedLand endpoint. 
As the DBPediaendpoint do not allow to retrieve remote RDF, I used their 
default RDF graph. 

The Query I used is:
Query 1:
select * where {
optional {?person1  <http://xmlns.com/foaf/0.1/name> ?name1.}
optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.}
}
limit 5

Query 2:
select * where {
optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.}
optional {?person1  <http://xmlns.com/foaf/0.1/name> ?name1.}
}
limit 5

There is no triple containing property <http://xmlns.com/foaf/0.1/knows>, 
so that Query 1 returns result:
person1
name1
person2
http://dbpedia.org/resource/Pancreatic_cancer

"Pancreas Cancer"@en

http://dbpedia.org/resource/Hondamatic

"H2"@en

http://dbpedia.org/resource/RNA_%28journal%29

"RNA"@en

http://dbpedia.org/resource/August_30th_%28song%29

"August 30th"@en

http://dbpedia.org/resource/Sculpture_%28magazine%29

"Sculpture"@en


But Query 2 does not return any result.



I understand the above mentioned issue could be solved by stating a non 
optional statement to define the binding scope of the variables appeared 
in the optional block. Like stating “?x a foaf:Person.” before the 
optional statements, but what shall we do if the graph we are going to 
query on is like below (vc-db-incomplete.ttl):

_:a a foaf:Person ;
    foaf:name   "Matt Jones" .

_:b    foaf:name   "Sarah Jones" .

_:c a foaf:Person ;
    vcard:FN    "Becky Smith" .

_:d    vcard:FN    "John Smith" .


In case a user writes query like what stated in Query 2, then shall we 
guarantee that different SPARQL applications adhere to a same standard to 
deliver same interpretation? To my opinion, this is crucial if we want to 
link open data together, but how can we achieve this? Any suggestion? Many 
thanks in advance!


PS, according to the article “Semantic and Complexity of SPARQL”, using 
OPTIONAL without any restriction may create PSPACE complexity. It suggests 
using well designed patterns for OPTIONAL to eliminate the troubles 
brought by using OPTIONAL incorrectly. But how should our SPARQL endpoints 
react to those not-well-designed-queries? Answers to this question are 
hard to make to my humble opinion but have big impact to RDB to RDF 
mapping because: 

If we intend to support such queries strictly following the SPARQL 
Specification adhering to the left-associative property; then in many 
cases, users do not meant to have those queries interpreted in the right 
ways according to the specification, they just want ‘intuitive optional’. 
In addition, the translated SQL query would also become extremely complex 
and hard to optimize due to the complexities brought by optional.

Then can we say we do not support those queries? Even though, there are 
some rare cases we need such not-well-designed-queries, like the case 
querying the incomplete database I just mentioned. In addition, even if we 
decide to reject not-well-designed-queries, it is also difficulty to 
define what kinds of queries are not-well-designed-queries. 

I am quite lost in what actions we should take now? Anyone could help? 
Thanks in advance!

Kind Regards,
Hong




Kind Regards,

Hong Sun | Agfa HealthCare
Researcher | HE/Advanced Clinical Applications Research
T  +32 3 444 8108 | F  +32 3 444 8401

Agfa HealthCare NV, Moutstraat 100, 9000 Gent, Belgium
http://www.agfahealthcare.com

Click on link to read important disclaimer: 
http://www.agfahealthcare.com/maildisclaimer 
Received on Tuesday, 18 October 2011 08:10:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 18 October 2011 08:10:20 GMT