Re: Implementation issues of OPTIONAL from Lee Feigenbaum on 2011-10-24 (public-rdf-dawg-comments@w3.org from October 2011)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Mon, 24 Oct 2011 09:03:22 -0400
To: hong.sun@agfa.com
CC: public-rdf-dawg-comments@w3.org, jos.deroo@agfa.com
Message-ID: <4EA5621A.7000909@thefigtrees.net>
Hi Hong,

The key issue is how a variable is handled when it is not matched in the 
first OPTIONAL clause, and used a second time in a later OPTIONAL 
clause. This is part of SPARQL 1.0 and has not changed in SPARQL 1.1.

In SPARQL, there is no "NULL" and an unmatched variable is not simply 
not bound to any value and is available to be bound in a later clause.

A pattern such as:

{
   ?x a foaf:Person .
   OPTIONAL { ?x foaf:name ?name }
   OPTIONAL { ?x vcard:FN ?name }
}

...will bind name to the value for foaf:name, and only the value for 
vcard:FN if it was not found in the first OPTIONAL. This provides a way 
to access optional information preferentially.

In your second query, the results you are receiving from ARQ are the 
correct results. The Working Group produces detailed test suites to help 
implementers create multiple interoperable implementations of the SPARQL 
language. If you find that a particular implementation does not behave 
in a way you expect, you might consider contacting the implementer to 
determine why.

We would be grateful if you would acknowledge that your comment has been 
answered by sending a reply to this mailing list.

Lee
On behalf of the SPARQL WG

On 10/11/2011 8:30 AM, hong.sun@agfa.com wrote:
> Dear All,
>
> I have been testing the OPTIONAL function of SPARQL and discovered that
> some optional queries are performed differently among Virtuoso, Rasqal
> and ARQ 2.8.8 . The problem is mainly caused by the left associative
> characteristic of OPTIONAL, I will try to specify my problems in the
> following.
>
> I used two public endpoints to test Virtuoso and Rasqal:
> DBPedia (for Virtuoso): _http://dbpedia.org/sparql_
> RedLand (for Rasqal): _http://librdf.org/query_
>
> ARQ 2.8.8 is downloaded from SourceForge:
> _https://sourceforge.net/projects/jena/files/ARQ/_
>
> The RDF graph I queried on in Redland and ARQ 2.8.8 is from
> _http://jena.sourceforge.net/ARQ/Tutorial/vc-db-3.ttl_
>
> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
> @prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> .
>
> _:a a foaf:Person ;
> foaf:name "Matt Jones" .
>
> _:b a foaf:Person ;
> foaf:name "Sarah Jones" .
>
> _:c a foaf:Person ;
> vcard:FN "Becky Smith" .
>
> _:d a foaf:Person ;
> vcard:FN "John Smith" .
>
>
> SPARQL Query1:
>
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
> SELECT *
> WHERE {
> optional {?x vcard:FN ?name .}
> optional {?x foaf:nick ?nick .}
> }
>
> Results from RedLand is:
>
> ---------------------------------------------------------------------------------------------
>
> | count |x | name | nick |
> =====================================================
> | 1 |blank node r1317995959r5965r4| "John Smith" | |
> | 2 |blank node r1317995959r5965r3| "Becky Smith" | |
> ---------------------------------------------------------------------------------------------
>
>
>
> Results from ARQ is:
>
> -------------------------------
> | x | name | nick |
> ===================
> | _:b0 | "John Smith" | |
> | _:b1 | "Becky Smith" | |
> -------------------------------
>
> But if I switch the order of the optional statements, make the first one
> not bound, then the two applications delivers different results.
>
> SPARQL Query2:
>
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
> SELECT *
> WHERE {
> optional {?x foaf:nick ?nick .}
> optional {?x vcard:FN ?name .}
> }
>
> Results from RedLand is empty :
> Found 0 results
> Results from ARQ is:
> -------------------------------
> | x | nick | name |
> ====================
> | _:b0 | | "John Smith" |
> | _:b1 | | "Becky Smith" |
> -------------------------------
>
> This is because if the first optional statement is not bound, the empty
> binding is passed to the second statement in RedLand endpoint; while in
> ARQ, the variables in the second statement is free to bind if the first
> optional is not bound.
>
> The DBPedia endpoint behaves similarly to the RedLand endpoint.
> As the DBPediaendpoint do not allow to retrieve remote RDF, I used their
> default RDF graph.
>
> The Query I used is:
> Query 1:
> select * where {
> optional {?person1 <http://xmlns.com/foaf/0.1/name> ?name1.}
> optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.}
> }
> limit 5
>
> Query 2:
> select * where {
> optional {?person1 <http://xmlns.com/foaf/0.1/knows> ?person2.}
> optional {?person1 <http://xmlns.com/foaf/0.1/name> ?name1.}
> }
> limit 5
>
> There is no triple containing property
> <http://xmlns.com/foaf/0.1/knows>, so that Query 1 returns result:
> *person1*
>  
> *name1*
>  
> *person2*
> http://dbpedia.org/resource/Pancreatic_cancer  "Pancreas Cancer"@en  
> http://dbpedia.org/resource/Hondamatic  "H2"@en  
> http://dbpedia.org/resource/RNA_%28journal%29  "RNA"@en  
> http://dbpedia.org/resource/August_30th_%28song%29  "August 30th"@en  
> http://dbpedia.org/resource/Sculpture_%28magazine%29  "Sculpture"@en  
>
>
>
> But Query 2 does not return any result.
>
>
>
> I understand the above mentioned issue could be solved by stating a non
> optional statement to define the binding scope of the variables appeared
> in the optional block. Like stating “?x a foaf:Person.” before the
> optional statements, but what shall we do if the graph we are going to
> query on is like below (vc-db-incomplete.ttl):
>
> _:a a foaf:Person ;
> foaf:name "Matt Jones" .
>
> _:b foaf:name "Sarah Jones" .
>
> _:c a foaf:Person ;
> vcard:FN "Becky Smith" .
>
> _:d vcard:FN "John Smith" .
>
>
> In case a user writes query like what stated in Query 2, then shall we
> guarantee that different SPARQL applications adhere to a same standard
> to deliver same interpretation? To my opinion, this is crucial if we
> want to link open data together, but how can we achieve this? Any
> suggestion? Many thanks in advance!
>
>
> PS, according to the article “Semantic and Complexity of SPARQL”, using
> OPTIONAL without any restriction may create PSPACE complexity. It
> suggests using well designed patterns for OPTIONAL to eliminate the
> troubles brought by using OPTIONAL incorrectly. But how should our
> SPARQL endpoints react to those not-well-designed-queries? Answers to
> this question are hard to make to my humble opinion but have big impact
> to RDB to RDF mapping because:
>
> If we intend to support such queries strictly following the SPARQL
> Specification adhering to the left-associative property; then in many
> cases, users do not meant to have those queries interpreted in the right
> ways according to the specification, they just want ‘intuitive
> optional’. In addition, the translated SQL query would also become
> extremely complex and hard to optimize due to the complexities brought
> by optional.
>
> Then can we say we do not support those queries? Even though, there are
> some rare cases we need such not-well-designed-queries, like the case
> querying the incomplete database I just mentioned. In addition, even if
> we decide to reject not-well-designed-queries, it is also difficulty to
> define what kinds of queries are not-well-designed-queries.
>
> I am quite lost in what actions we should take now? Anyone could help?
> Thanks in advance!
>
> Kind Regards,
> Hong
>
>
>
>
> Kind Regards,
> *
> Hong Sun | **Agfa HealthCare*
> Researcher | HE/Advanced Clinical Applications Research
> T +32 3 444 8108 | F +32 3 444 8401
>
> Agfa HealthCare NV, Moutstraat 100, 9000 Gent, Belgium
> http://www.agfahealthcare.com <http://www.agfahealthcare.com/>
> ------------------------------------------------------------------------
> Click on link to read important disclaimer:
> http://www.agfahealthcare.com/maildisclaimer
Received on Monday, 24 October 2011 13:04:05 UTC