Extending RDF stores to support RDF*/PG versus RDF*/SA

One more data point on the PG versus SA debate: Extending an RDF store with an efficient implementation of RDF* is likely to be easier for the PG version than for the SA version.

Let me explain that thesis.

Consider this SPARQL* triple pattern:

    <<:s1234 ?p ?o>> ?ap ?ao

It asks to retrieve all annotations on properties about subject :s1234.

How would an RDF* store answer this efficiently?

Existing RDF stores typically only provide fast access by triple pattern. That is, they have the index data structures necessary to quickly answer patterns like this:

    :s1234 ?p ?o

But they are not good at answering patterns like this:

    ?s ?p ?o
    FILTER contains(str(?s), "s123")

because that would require a scan through all triples in the store, and check each subject for the substring "s123". This is slow.

So, back to our SPARQL* pattern:

    <<:s1234 ?p ?o>> ?ap ?ao

Can an RDF* store answer this pattern efficiently?

In the PG version of RDF*, the answer is obviously “yes”. If there are any triples of the form

    <<:s1234 ?p ?o>> ?ap ?ao

then there would also be a triple of the form

    :s1234 ?p ?o

in the graph, and looking for such triples is just bread and butter for an RDF store. Having found all matching triples, the store just needs to check for triples having one of these triples as the subject, which again is just a basic triple pattern lookup.

In the SA version of RDF*, the answer is: No, unless there is an RDF*-specific additional index structure in place. Because in the SA version, the shortcut of first looking for triples of the form

    :s1234 ?p ?o

is not available, as in SA, <<t>> can exist without t being in the graph.

So, looking for the pattern

    <<:s1234 ?p ?o>> ?ap ?ao

boils down to

    ?t ?ap ?ao
    FILTER (isTriple(?t) && (subject(?t) = :s1234))

and without special index structures, this is going to require a full scan of the graph.

In summary, extending an RDF store with an efficient implementation of RDF* is likely to be easier for the PG version than for the SA version, because the PG version can use the existing triple-lookup indexes when searching for nested triples, while an SA version would require new indexes for efficient lookup of nested triples.

Richard

Received on Saturday, 21 September 2019 12:21:20 UTC