Making querying of annotations optional

tl;dr
=====

Querying annotated triples seems to always require its own BGP expression, no matter if the triple so annotated exists in the graph or not. It seems like the annotation can never be just an option, added to a query for simple triples. 

Asked concretely: is there a way to make the ?src information optional in the query using annotation syntax? E.g. something like

 SELECT ?x ?age ?src
 WHERE { ?x foaf:age ?age  OPTIONAL {| :src ?src |} . }





QUERYING IN RDF*
================

In a paper on RDF* and SPARQL* [0] the following example data is given:

 :bob foaf:name "Bob" .
 <<:bob foaf:age 23>> dct:creator <http://example.com/crawlers#c1> ;
                      dct:source <http://example.net/listing.html> .

Note that this is RDF*, not RDF-star, and the statement ':bob foaf:age 23' is considered to be true in the graph, i.e. stated.

Then the following query is presented:

 SELECT ?x ?age ?src 
 WHERE  { <<?x foaf:age ?age>> dct:source ?src . }

Since the ?src is explicitly asked for, the query seems sensible. But what if one doesn’t care for the source? What if one doesn’t care if a source annotation is provided at all? What if one isn’t even aware of the possibility that an annotation might have be added? It seems that a query for people's age that isn’t aware of that peculiarity will not return Bob’s age.
IIUC the following query

 SELECT ?x ?age
 WHERE  { ?x foaf:age ?age . }

will not return any results, although Bob’s age is considered to be "in the graph". Also the query over embedded triples wouldn’t find any people’s age that is not annotated, i.e. that is stated in a plain triple. 

One virtue of an annotation facility over regular n-ary relations is that an annotation doesn’t require to change the existing data - no need to add blank nodes and branches, or even complete reformulations. That is great when authoring, and it also means that existing queries will continue to work.

However, the presented approach means that to query for facts that MIGHT also be annotated, one has to duplicate the query effort. A query with the very plausible intent to gather all people’s age, annotated or not, will have to write:

 SELECT ?x ?age 
 WHERE  {
          { ?x foaf:age ?age . }
          UNION
          { <<?x foaf:age ?age>> ?a ?b . }
        }

Although the embedded triple is considered to be true in the graph it is a special kind of triple and not captured by a normal query. What if queries get more complicated, and it is unforseeable if some statement might be annotated or not? IMO from a usability perspective this is highly problematic. 
Caveat: I have no way to test if the above is actually correct. I hope it is - if not, please tell me, and have mercy.


QUERYING IN RDF-star
====================

But that was RDF*, water under the bridge. How is RDF-star doing? Better, in some way. Because embedded triples are now triple terms and not stated anymore - which most of the time is bad [1] - the above problem can’t happen anymore - which is good. A query for people’s age will find people’s age, and a query for annotations on people’s age will ask for those annotations irrespective of that age actually being a fact in the graph. So the disconnect has its advantages. The tediousness of course remains, because it’s still two different queries - one for facts, one for annotations on reifications (no example here, because the mail is long already).

Does the annotation syntax solve this problem in SPARQL-star? 
Let’s take the following graph:

 :Alice foaf:age 24 .
 :Bob foaf:age :42 {| :src :Carol |} . 

The following query will get all people’s age, annotated or not

 SELECT ?x ?age 
 WHERE  { ?x foaf:age ?age . }

because the actual data queried is 

 :Alice foaf:age 24 .
 :Bob foaf:age :42 .
 _:r rdf:reifies <<( :Bob foaf:age :42 )>> ;
     :src :Carol . 

That is better than RDF* above, but it doesn’t cater for what we are actually concerned with: annotating statements.

Can the annotation syntax be used to query for facts, and return annotations optionally? I’m not sure. What will the following query return:

 SELECT ?x ?age ?src
 WHERE { ?x foaf:age ?age {| :src ?src|} . }

It will only return ' :Bob, 42, :Carol '.

To get Alice and Bob, one would again have to use a UNION or OPTIONAL, e.g.

 SELECT ?x ?age ?src
 WHERE { 
         { ?x foaf:age ?age . }
         UNION
         { ?x foaf:age ?age {| :src ?src |} . }
       }

And again, like with RDF*, IMO from a usability perspective this is highly problematic.

Is there a way to make the ?src information optional in the query using annotation syntax? E.g. something like

 SELECT ?x ?age ?src
 WHERE { ?x foaf:age ?age  OPTIONAL {| :src ?src |} . }


Best,
Thomas


[0] Olaf Hartig: Foundations of RDF* and SPARQL* - An Alternative Approach to Statement-Level Metadata in RDF, June 2017, http://olafhartig.de/files/Hartig_AMW2017_RDFStar.pdf
[1] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Aug/0032.html

Received on Thursday, 8 August 2024 16:03:07 UTC