Re: PG mode and SA mode from Jeen Broekstra on 2019-09-19 (public-rdf-star@w3.org from September 2019)

From: Jeen Broekstra <jeen.broekstra@gmail.com>
Date: Thu, 19 Sep 2019 13:59:53 +1000
To: Olaf Hartig <olaf.hartig@liu.se>
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <CANyF_kHU1RgB=OdgtUmAuC0P=82gFnoki5nOVakoE9r08cEVxw@mail.gmail.com>
Thanks for this clear summary Olaf, it's definitely helped me get back up
to speed with the thrust of the discussion sofar.

I don't think having these two separate modes baked into the language
specification is a good idea. Apart from anything else, you would get
problems in interoperability between tools. you would have to force all
tooling to support both possible interpretations and in addition add a
feature to all possible syntax carriers that makes it explicit which
interpretation is meant for any given document - and even then there is lot
of room for user error.

Far simpler to stick to a single interpretation. From my perspective, SA
seems the more logical choice here, as it doesn't rule out the alternative:
using SA, you can express both intentions, with the only minor downside
being that you have to write down the triple twice. But IMO that's hardly a
massive problem, it's what compression algorithms are good for.

Cheers,

Jeen

On Thu, Sep 19, 2019 at 4:55 AM Olaf Hartig <olaf.hartig@liu.se> wrote:

> Dear all,
>
> There has been some confusion about the modes I have mentioned in some of
> the
> other threads (PG mode and SA mode). My aim with this email is to clear up
> this confusion and to start a discussion of these modes. More precisely,
> in
> this email I will describe again what I mean by these modes and,
> thereafter,
> ask you to voice your opinions about the modes (whether you prefer one
> over
> the other, whether both should be possible to use or it would be better to
> continue only with one of them, etc). In the following examples, to avoid
> getting distracted or confused by details of some serialization syntax I
> am
> going to use the abstract syntax of the RDF* data model and the SPARQL*
> language---i.e., nested triples and nested triple patterns---rather than a
> concrete serialization format such as Turtle*.
>
> Now, the reason that made me come up with the idea of having two separate
> modes in which RDF*/SPARQL* can be used is the following question: Given a
> nested RDF* triple, say
>
>  t = ( (s,p,o), p2, o2 ),
>
> is this nested triple meant to only make an assertion about the triple
> t'=(s,p,o) that it contains (i.e., without asserting t') or is it meant to
> additionally also assert t'? Since both of these options may have merits,
> I
> think it is worth considering which of them should be used for the RDF*/
> SPARQL* approach or, if possible, whether it is desirable to enable users
> to
> choose which one they use. So, let's have a more detailed look at the two
> options.
>
> I start with the second option, which is what we may call "using
> RDF*/SPARQL*
> in Property Graph mode" (PG mode, for short). Again, when using this mode,
> the
> aforementioned nested RDF* triple t is meant both to make an assertion
> about
> the triple (s,p,o) and to also assert that triple, (s,p,o), at the same
> time.
> To make the implications of this option a bit more clear, consider the
> following nested triple pattern (which may be used in the WHERE clause of
> a
> SPARQL* query):
>
>  tp = ( s, p, ?v ) .
>
> When using RDF*/SPARQL* in PG mode, evaluating this triple pattern over
> the
> RDF* graph G that contains (only) the aforementioned nested RDF* triple t
> results in a single solution* mapping m = {?v -> o} that maps variable ?v
> to
> o.
>
> At this point it may be important to emphasize that the papers that I have
> published so far about the RDF*/SPARQL* approach assume this PG mode, but
> without explicitly calling it like that. The reason for having made this
> choice (rather than the alternative, which I now call SA mode as described
> below) is that my initial perspective on RDF*/SPARQL* has been influenced
> by
> discussions with triplestore vendors who were interested in a practical,
> reification-like feature to capture and to query statement-level
> annotations.
> The general intention was that this feature would be used in a way like
> people
> use the notion of edge properties in Property Graph databases (if you are
> not
> familiar with Property Graphs: an "edge property" is a key-value pair
> associated with an edge in such a graph). Then, the implicit notion of
> using
> PG mode followed from this intention because, in a Property Graph, to
> assign
> edge properties to an edge, the edge must exist in the graph.
>
> Coming back now to the aforementioned two options regarding what the
> nested
> triple t could be meant to assert, there can be use cases for which the
> first
> option is more suitable than the second one. Adopting the first option is
> what
> we may call "using RDF*/SPARQL* in separate-assertions mode" (SA mode). To
> reiterate, when using this mode, the aforementioned nested RDF* triple t
> is
> meant to only make an assertion about the triple t'=(s,p,o) without
> asserting
> t'. If we now look again at the aforementioned triple pattern tp and use
> RDF*/
> SPARQL* in SA mode, evaluating tp over the same RDF* graph G as mentioned
> before (G consists only of the nested triple t) results in no solution
> mapping
> at all---whereas in PG mode the result contains the aforementioned mapping
> m =
> {?v -> o}.
>
> The difference between PG mode and SA mode can also be observed if we
> consider
> how RDF* graphs may be converted into standard RDF graphs based on the RDF
> reification vocabulary: If we assume RDF* is used in SA mode, the set
> consisting of the following five RDF triples captures the information as
> captured by our example RDF* triple t (where b is a fresh blank node):
>
> (b, rdf:type, rdf:Statement)
> (b, rdf:subject, s)
> (b, rdf:predicate, p)
> (b, rdf:object, o)
> (b, p2, o2)
>
> In contrast, if RDF* is used in PG mode, our example RDF* triple t would
> have
> to be converted into the following set of RDF triples, which contains one
> additional triple (namely, the last one in the following list):
>
> (b, rdf:type, rdf:Statement)
> (b, rdf:subject, s)
> (b, rdf:predicate, p)
> (b, rdf:object, o)
> (b, p2, o2)
> (s, p, o)
>
> I hope that these examples clarify now what I mean by using the
> RDF*/SPARQL*
> approach in PG mode versus using it in SA mode.
>
> My main aim of introducing the notion of these modes was to give names to
> the
> two possible options as a basis for a discussion about which of them
> should be
> selected for the specification of the RDF*/SPARQL* approach.
>
> Then, on top of that, and as a possible alternative to making the strict
> decision of selecting only one of them for the spec, I thought it might be
> useful to cover both of these options in the spec; essentially allowing
> system/tool developers to decide which of them they support (and
> indicating so
> in their documentation) and, in turn, enabling users to employ the systems/
> tools that support the mode that is suitable for their use case. However,
> baking such a choice into a specification might be a bad idea, and I am
> interested in peoples' opinions about this.
>
> By the way, yet another alternative would be to combine the ideas of both
> modes and enable users to make explicit for each nested triple whether
> this
> nested triple is meant to represent an annotation of an asserted triple
> (like
> in PG mode) or a statement about a non-asserted triple (like in SA mode).
> While this alternative would render the notion of having separate modes
> obsolete, it would require extending the notation in the abstract syntax
> (and
> then also in concrete user-focused syntaxes). So, perhaps we should not go
> there at this point and, instead, first try to get a better understanding
> of
> the pros and cons of the two separate modes.
>
> Therefore, I would like to get your opinions on the following questions.
> What do you think are the merits of the PG mode versus the SA mode, and
> vice
> versa?
> Do you have a clear preference for one mode over the other?
> What do you think about introducing both modes in a specification of the
> RDF*/
> SPARQL* approach?
>
> Thanks,
> Olaf
>
>
Received on Thursday, 19 September 2019 04:00:28 UTC