Re: RDF* vs RDF vs named graphs from Olaf Hartig on 2020-12-05 (public-rdf-star@w3.org from December 2020)

From: Olaf Hartig <olaf.hartig@liu.se>
Date: Sat, 5 Dec 2020 15:26:02 +0100
To: public-rdf-star@w3.org, Holger Knublauch <holger@topquadrant.com>
Message-ID: <bc93107a-dc39-4768-bc33-86472495568e.maildroid@localhost>
Thanks, Holger and Pavel, for your valuable input as implementers.

I believe you both also gave an indirect answer to a question that Peter asked during yesterday's call: can embedded triples help to improve query execution performance (or something along these lines). I will try to put this a bit into perspective of some of the things discussed in the call. 

To store RDF* data and, in particular, to query it efficiently, one needs specific data structures.

The same holds for standard RDF reification data. Just storing it and treating it as ordinary RDF triples is totally inefficient because it would require four physical joins, which may be avoided by using special data structures for the reification triples. 

However, both Andy and Pavel made clear in yesterday's call that, in their experience, there are too many additional complications in dealing with RDF reification data (e.g., parsing) that make treating RDF reification triples in a special way a big engineering effort without much success in terms of performance improvements.

I think that embedded triples–as an additional element of the abstract syntax–provide a way forward. They are an explicit notion for which special data structures may be built and, at the same time, they are also explicitly represented by corresponding constructs (<<...>>) in the concrete serialization formats.

In this sense, the extension of the abstract syntax of RDF with embedded triples is an important feature of RDF*, even if we would come to an agreement that RDF* is (or should be) just syntactic sugar for standard RDF reification. 

Regarding this syntactic sugar question, I believe that this question may be confusing or misleading to some. From an implementer's perspective, "syntactic sugar" may be interpreted as something that would be replaced during parsing and not considered internally in a system. However, I guess that this is not the intention of the question. Instead, I assume that the question is more about the semantics of embedded triples rather than about requiring systems to actually convert RDF* graphs into RDF graphs. 

One more response to Holger's email: I see such "long URIs" primarily as an implementation approach that a system may choose to represent embedded triples internally (for instance, if the introduction of a separate new Java class "breaks too much existing code"). 

Best regards, 
Olaf 


-----Original Message-----
From: Holger Knublauch <holger@topquadrant.com>
To: public-rdf-star@w3.org
Sent: Sat, 05 Dec 2020 2:30
Subject: Re: RDF* vs RDF vs named graphs

On 12/4/2020 8:33 PM, Pavel Klinov wrote:

> Disclaimer: YMMV, other vendors may have a different experience. I'm 
> not really willing to debate who's experience is the right or most 
> representative, just responding to Pierre-Antoine's call for vendor 
> comments.

As a vendor report from TopQuadrant, we had introduced reification based 
on long URIs a few versions ago, and several customers have it in 
routine use. We have introduced a light-weight SHACL extension to allow 
validation and to drive input forms, see 
http://datashapes.org/reification.html#reifiableBy - this solution seems 
popular and efficient for its use cases.

One customer (at least) had asked for backward-compatible converters on 
data import and export between the long URIs and rdf:Statements. I think 
this is understandable at least until some Turtle* syntax becomes widely 
accepted that would hide those complexities. And these customers have 
existing external systems that rely on standard reification. Not that 
there is a potential mismatch in that standard reification allows the 
same triple to be reified multiple times.

Technically, it would not be a complete show stopper for us to treat 
RDF* as a syntactic sugar for rdf:Statements, yet we would almost 
certainly require graph-level optimizations and special indexes. In 
particular it needs to be efficient to ask "get me all reifications for 
a given triple" for display purposes on a form, and to perform clean up 
if an asserted triple gets deleted. With the long-URI approach this can 
be answered in constant O(1) time, while a naive rdf:Statement 
implementation would require joins. And rdf:Statements may cause a 
maintenance nightmare (partial adds/deletes etc) and may cause 
misleading matches in the graph (e.g. incoming references that are 
really on the reification layer). The long URIs are however currently 
very memory hungry, so we are sitting at an unsatisfactory in-between state.

Introducing a new node type for triples is not viable because it breaks 
too much existing code. From what I can tell this option seems to be on 
its way out.

Holger
Received on Saturday, 5 December 2020 14:26:26 UTC