- From: Olaf Hartig <olaf.hartig@liu.se>
- Date: Wed, 18 Sep 2019 18:53:47 +0000
- To: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Dear all,
There has been some confusion about the modes I have mentioned in some of the
other threads (PG mode and SA mode). My aim with this email is to clear up
this confusion and to start a discussion of these modes. More precisely, in
this email I will describe again what I mean by these modes and, thereafter,
ask you to voice your opinions about the modes (whether you prefer one over
the other, whether both should be possible to use or it would be better to
continue only with one of them, etc). In the following examples, to avoid
getting distracted or confused by details of some serialization syntax I am
going to use the abstract syntax of the RDF* data model and the SPARQL*
language---i.e., nested triples and nested triple patterns---rather than a
concrete serialization format such as Turtle*.
Now, the reason that made me come up with the idea of having two separate
modes in which RDF*/SPARQL* can be used is the following question: Given a
nested RDF* triple, say
t = ( (s,p,o), p2, o2 ),
is this nested triple meant to only make an assertion about the triple
t'=(s,p,o) that it contains (i.e., without asserting t') or is it meant to
additionally also assert t'? Since both of these options may have merits, I
think it is worth considering which of them should be used for the RDF*/
SPARQL* approach or, if possible, whether it is desirable to enable users to
choose which one they use. So, let's have a more detailed look at the two
options.
I start with the second option, which is what we may call "using RDF*/SPARQL*
in Property Graph mode" (PG mode, for short). Again, when using this mode, the
aforementioned nested RDF* triple t is meant both to make an assertion about
the triple (s,p,o) and to also assert that triple, (s,p,o), at the same time.
To make the implications of this option a bit more clear, consider the
following nested triple pattern (which may be used in the WHERE clause of a
SPARQL* query):
tp = ( s, p, ?v ) .
When using RDF*/SPARQL* in PG mode, evaluating this triple pattern over the
RDF* graph G that contains (only) the aforementioned nested RDF* triple t
results in a single solution* mapping m = {?v -> o} that maps variable ?v to
o.
At this point it may be important to emphasize that the papers that I have
published so far about the RDF*/SPARQL* approach assume this PG mode, but
without explicitly calling it like that. The reason for having made this
choice (rather than the alternative, which I now call SA mode as described
below) is that my initial perspective on RDF*/SPARQL* has been influenced by
discussions with triplestore vendors who were interested in a practical,
reification-like feature to capture and to query statement-level annotations.
The general intention was that this feature would be used in a way like people
use the notion of edge properties in Property Graph databases (if you are not
familiar with Property Graphs: an "edge property" is a key-value pair
associated with an edge in such a graph). Then, the implicit notion of using
PG mode followed from this intention because, in a Property Graph, to assign
edge properties to an edge, the edge must exist in the graph.
Coming back now to the aforementioned two options regarding what the nested
triple t could be meant to assert, there can be use cases for which the first
option is more suitable than the second one. Adopting the first option is what
we may call "using RDF*/SPARQL* in separate-assertions mode" (SA mode). To
reiterate, when using this mode, the aforementioned nested RDF* triple t is
meant to only make an assertion about the triple t'=(s,p,o) without asserting
t'. If we now look again at the aforementioned triple pattern tp and use RDF*/
SPARQL* in SA mode, evaluating tp over the same RDF* graph G as mentioned
before (G consists only of the nested triple t) results in no solution mapping
at all---whereas in PG mode the result contains the aforementioned mapping m =
{?v -> o}.
The difference between PG mode and SA mode can also be observed if we consider
how RDF* graphs may be converted into standard RDF graphs based on the RDF
reification vocabulary: If we assume RDF* is used in SA mode, the set
consisting of the following five RDF triples captures the information as
captured by our example RDF* triple t (where b is a fresh blank node):
(b, rdf:type, rdf:Statement)
(b, rdf:subject, s)
(b, rdf:predicate, p)
(b, rdf:object, o)
(b, p2, o2)
In contrast, if RDF* is used in PG mode, our example RDF* triple t would have
to be converted into the following set of RDF triples, which contains one
additional triple (namely, the last one in the following list):
(b, rdf:type, rdf:Statement)
(b, rdf:subject, s)
(b, rdf:predicate, p)
(b, rdf:object, o)
(b, p2, o2)
(s, p, o)
I hope that these examples clarify now what I mean by using the RDF*/SPARQL*
approach in PG mode versus using it in SA mode.
My main aim of introducing the notion of these modes was to give names to the
two possible options as a basis for a discussion about which of them should be
selected for the specification of the RDF*/SPARQL* approach.
Then, on top of that, and as a possible alternative to making the strict
decision of selecting only one of them for the spec, I thought it might be
useful to cover both of these options in the spec; essentially allowing
system/tool developers to decide which of them they support (and indicating so
in their documentation) and, in turn, enabling users to employ the systems/
tools that support the mode that is suitable for their use case. However,
baking such a choice into a specification might be a bad idea, and I am
interested in peoples' opinions about this.
By the way, yet another alternative would be to combine the ideas of both
modes and enable users to make explicit for each nested triple whether this
nested triple is meant to represent an annotation of an asserted triple (like
in PG mode) or a statement about a non-asserted triple (like in SA mode).
While this alternative would render the notion of having separate modes
obsolete, it would require extending the notation in the abstract syntax (and
then also in concrete user-focused syntaxes). So, perhaps we should not go
there at this point and, instead, first try to get a better understanding of
the pros and cons of the two separate modes.
Therefore, I would like to get your opinions on the following questions.
What do you think are the merits of the PG mode versus the SA mode, and vice
versa?
Do you have a clear preference for one mode over the other?
What do you think about introducing both modes in a specification of the RDF*/
SPARQL* approach?
Thanks,
Olaf
Received on Wednesday, 18 September 2019 18:54:39 UTC