Re: Extending RDF stores to support RDF*/PG versus RDF*/SA from Andy Seaborne on 2019-09-26 (public-rdf-star@w3.org from September 2019)

From: Andy Seaborne <andy@apache.org>
Date: Thu, 26 Sep 2019 12:38:55 +0100
To: public-rdf-star@w3.org
Message-ID: <1b441c17-7521-1152-4737-e52dfba7c42d@apache.org>

A light-weight SA implementation might be able to use reification, then 
lookup by reification-subject is possible.



PG presumably would maintain the integrity constraints that deleting the 
target triple also deletes the uses of <<>> for the triple.

If so, that makes deleting more complicated (who deletes data anyway? :-)

But it also makes changing data different.

To change the object value of a triple, the operations are "delete old 
triple, add new triple".  But if the 'delete' step does the data 
management, then the <<>> triple are also deleted, but not restored when 
the 'add' happens.

So there is complexity for the store and libraries over the store there.

     Andy

On 9/21/19 1:20 PM, Richard Cyganiak wrote:
> One more data point on the PG versus SA debate: Extending an RDF store with an efficient implementation of RDF* is likely to be easier for the PG version than for the SA version.
> 
> Let me explain that thesis.
> 
> Consider this SPARQL* triple pattern:
> 
>      <<:s1234 ?p ?o>> ?ap ?ao
> 
> It asks to retrieve all annotations on properties about subject :s1234.
> 
> How would an RDF* store answer this efficiently?
> 
> Existing RDF stores typically only provide fast access by triple pattern. That is, they have the index data structures necessary to quickly answer patterns like this:
> 
>      :s1234 ?p ?o
> 
> But they are not good at answering patterns like this:
> 
>      ?s ?p ?o
>      FILTER contains(str(?s), "s123")
> 
> because that would require a scan through all triples in the store, and check each subject for the substring "s123". This is slow.
> 
> So, back to our SPARQL* pattern:
> 
>      <<:s1234 ?p ?o>> ?ap ?ao
> 
> Can an RDF* store answer this pattern efficiently?
> 
> In the PG version of RDF*, the answer is obviously “yes”. If there are any triples of the form
> 
>      <<:s1234 ?p ?o>> ?ap ?ao
> 
> then there would also be a triple of the form
> 
>      :s1234 ?p ?o
> 
> in the graph, and looking for such triples is just bread and butter for an RDF store. Having found all matching triples, the store just needs to check for triples having one of these triples as the subject, which again is just a basic triple pattern lookup.
> 
> In the SA version of RDF*, the answer is: No, unless there is an RDF*-specific additional index structure in place. Because in the SA version, the shortcut of first looking for triples of the form
> 
>      :s1234 ?p ?o
> 
> is not available, as in SA, <<t>> can exist without t being in the graph.
> 
> So, looking for the pattern
> 
>      <<:s1234 ?p ?o>> ?ap ?ao
> 
> boils down to
> 
>      ?t ?ap ?ao
>      FILTER (isTriple(?t) && (subject(?t) = :s1234))
> 
> and without special index structures, this is going to require a full scan of the graph.
> 
> In summary, extending an RDF store with an efficient implementation of RDF* is likely to be easier for the PG version than for the SA version, because the PG version can use the existing triple-lookup indexes when searching for nested triples, while an SA version would require new indexes for efficient lookup of nested triples.
> 
> Richard
>

Received on Thursday, 26 September 2019 11:39:20 UTC