PG mode and SA mode from Olaf Hartig on 2019-09-18 (public-rdf-star@w3.org from September 2019)

From: Olaf Hartig <olaf.hartig@liu.se>
Date: Wed, 18 Sep 2019 18:53:47 +0000
To: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <2775084.r9NbrRIqqg@porty3>
Dear all,

There has been some confusion about the modes I have mentioned in some of the 
other threads (PG mode and SA mode). My aim with this email is to clear up 
this confusion and to start a discussion of these modes. More precisely, in 
this email I will describe again what I mean by these modes and, thereafter, 
ask you to voice your opinions about the modes (whether you prefer one over 
the other, whether both should be possible to use or it would be better to 
continue only with one of them, etc). In the following examples, to avoid 
getting distracted or confused by details of some serialization syntax I am 
going to use the abstract syntax of the RDF* data model and the SPARQL* 
language---i.e., nested triples and nested triple patterns---rather than a 
concrete serialization format such as Turtle*.

Now, the reason that made me come up with the idea of having two separate 
modes in which RDF*/SPARQL* can be used is the following question: Given a 
nested RDF* triple, say

 t = ( (s,p,o), p2, o2 ),

is this nested triple meant to only make an assertion about the triple 
t'=(s,p,o) that it contains (i.e., without asserting t') or is it meant to 
additionally also assert t'? Since both of these options may have merits, I 
think it is worth considering which of them should be used for the RDF*/
SPARQL* approach or, if possible, whether it is desirable to enable users to 
choose which one they use. So, let's have a more detailed look at the two 
options.

I start with the second option, which is what we may call "using RDF*/SPARQL* 
in Property Graph mode" (PG mode, for short). Again, when using this mode, the 
aforementioned nested RDF* triple t is meant both to make an assertion about 
the triple (s,p,o) and to also assert that triple, (s,p,o), at the same time. 
To make the implications of this option a bit more clear, consider the 
following nested triple pattern (which may be used in the WHERE clause of a 
SPARQL* query):

 tp = ( s, p, ?v ) .

When using RDF*/SPARQL* in PG mode, evaluating this triple pattern over the 
RDF* graph G that contains (only) the aforementioned nested RDF* triple t 
results in a single solution* mapping m = {?v -> o} that maps variable ?v to 
o.

At this point it may be important to emphasize that the papers that I have 
published so far about the RDF*/SPARQL* approach assume this PG mode, but 
without explicitly calling it like that. The reason for having made this 
choice (rather than the alternative, which I now call SA mode as described 
below) is that my initial perspective on RDF*/SPARQL* has been influenced by 
discussions with triplestore vendors who were interested in a practical, 
reification-like feature to capture and to query statement-level annotations. 
The general intention was that this feature would be used in a way like people 
use the notion of edge properties in Property Graph databases (if you are not 
familiar with Property Graphs: an "edge property" is a key-value pair 
associated with an edge in such a graph). Then, the implicit notion of using 
PG mode followed from this intention because, in a Property Graph, to assign 
edge properties to an edge, the edge must exist in the graph.

Coming back now to the aforementioned two options regarding what the nested 
triple t could be meant to assert, there can be use cases for which the first 
option is more suitable than the second one. Adopting the first option is what 
we may call "using RDF*/SPARQL* in separate-assertions mode" (SA mode). To 
reiterate, when using this mode, the aforementioned nested RDF* triple t is 
meant to only make an assertion about the triple t'=(s,p,o) without asserting 
t'. If we now look again at the aforementioned triple pattern tp and use RDF*/
SPARQL* in SA mode, evaluating tp over the same RDF* graph G as mentioned 
before (G consists only of the nested triple t) results in no solution mapping 
at all---whereas in PG mode the result contains the aforementioned mapping m = 
{?v -> o}.

The difference between PG mode and SA mode can also be observed if we consider 
how RDF* graphs may be converted into standard RDF graphs based on the RDF 
reification vocabulary: If we assume RDF* is used in SA mode, the set 
consisting of the following five RDF triples captures the information as 
captured by our example RDF* triple t (where b is a fresh blank node):

(b, rdf:type, rdf:Statement)
(b, rdf:subject, s)
(b, rdf:predicate, p)
(b, rdf:object, o)
(b, p2, o2)

In contrast, if RDF* is used in PG mode, our example RDF* triple t would have 
to be converted into the following set of RDF triples, which contains one 
additional triple (namely, the last one in the following list):

(b, rdf:type, rdf:Statement)
(b, rdf:subject, s)
(b, rdf:predicate, p)
(b, rdf:object, o)
(b, p2, o2)
(s, p, o)

I hope that these examples clarify now what I mean by using the RDF*/SPARQL* 
approach in PG mode versus using it in SA mode.

My main aim of introducing the notion of these modes was to give names to the 
two possible options as a basis for a discussion about which of them should be 
selected for the specification of the RDF*/SPARQL* approach.

Then, on top of that, and as a possible alternative to making the strict 
decision of selecting only one of them for the spec, I thought it might be 
useful to cover both of these options in the spec; essentially allowing 
system/tool developers to decide which of them they support (and indicating so 
in their documentation) and, in turn, enabling users to employ the systems/
tools that support the mode that is suitable for their use case. However, 
baking such a choice into a specification might be a bad idea, and I am 
interested in peoples' opinions about this.

By the way, yet another alternative would be to combine the ideas of both 
modes and enable users to make explicit for each nested triple whether this 
nested triple is meant to represent an annotation of an asserted triple (like 
in PG mode) or a statement about a non-asserted triple (like in SA mode). 
While this alternative would render the notion of having separate modes 
obsolete, it would require extending the notation in the abstract syntax (and 
then also in concrete user-focused syntaxes). So, perhaps we should not go 
there at this point and, instead, first try to get a better understanding of 
the pros and cons of the two separate modes.

Therefore, I would like to get your opinions on the following questions.
What do you think are the merits of the PG mode versus the SA mode, and vice 
versa?
Do you have a clear preference for one mode over the other?
What do you think about introducing both modes in a specification of the RDF*/
SPARQL* approach?

Thanks,
Olaf
Received on Wednesday, 18 September 2019 18:54:39 UTC