Re: PG mode and SA mode from Richard Cyganiak on 2019-09-19 (public-rdf-star@w3.org from September 2019)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 19 Sep 2019 16:28:28 +0100
To: Olaf Hartig <olaf.hartig@liu.se>
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-Id: <D4F15317-A4F2-4A50-B2A3-C5071EEC342B@cyganiak.de>
Olaf,

Thank you for this summary.

Having two modes is not a practical solution in my opinion. How would interoperability be achieved if a receiver of a document or query does not know which mode is intended? The mode would always have to be communicated, either in the document or query itself, or in every protocol used to send or receive these artefacts. If the mode marker becomes detached or is overlooked, communication will fail. If some tool vendors decide to only support one mode, communication will fail. It is not a good option.

So I will consider PG and SA as different options for what the RDF*/Turtle*/SPARQL* languages could be once they are fully specified, rather than as different modes that implementations of these languages could operate in.

The SA option appeals more to me than PG option, assuming a suitable syntax.

A very quick sketch for such a syntax:

    # assert a triple -- basic Turtle -- this produces one triple
    :s :p :o.

    # annotate the triple :s :p :o without asserting it -- this produces one triple
    <<:s :p :o>> :a :b.

    # assert a triple and annotate it -- this produces two triples
    :s :p :o [[ :a :b ]].

Best,
Richard


> On 18 Sep 2019, at 19:53, Olaf Hartig <olaf.hartig@liu.se> wrote:
> 
> Dear all,
> 
> There has been some confusion about the modes I have mentioned in some of the 
> other threads (PG mode and SA mode). My aim with this email is to clear up 
> this confusion and to start a discussion of these modes. More precisely, in 
> this email I will describe again what I mean by these modes and, thereafter, 
> ask you to voice your opinions about the modes (whether you prefer one over 
> the other, whether both should be possible to use or it would be better to 
> continue only with one of them, etc). In the following examples, to avoid 
> getting distracted or confused by details of some serialization syntax I am 
> going to use the abstract syntax of the RDF* data model and the SPARQL* 
> language---i.e., nested triples and nested triple patterns---rather than a 
> concrete serialization format such as Turtle*.
> 
> Now, the reason that made me come up with the idea of having two separate 
> modes in which RDF*/SPARQL* can be used is the following question: Given a 
> nested RDF* triple, say
> 
> t = ( (s,p,o), p2, o2 ),
> 
> is this nested triple meant to only make an assertion about the triple 
> t'=(s,p,o) that it contains (i.e., without asserting t') or is it meant to 
> additionally also assert t'? Since both of these options may have merits, I 
> think it is worth considering which of them should be used for the RDF*/
> SPARQL* approach or, if possible, whether it is desirable to enable users to 
> choose which one they use. So, let's have a more detailed look at the two 
> options.
> 
> I start with the second option, which is what we may call "using RDF*/SPARQL* 
> in Property Graph mode" (PG mode, for short). Again, when using this mode, the 
> aforementioned nested RDF* triple t is meant both to make an assertion about 
> the triple (s,p,o) and to also assert that triple, (s,p,o), at the same time. 
> To make the implications of this option a bit more clear, consider the 
> following nested triple pattern (which may be used in the WHERE clause of a 
> SPARQL* query):
> 
> tp = ( s, p, ?v ) .
> 
> When using RDF*/SPARQL* in PG mode, evaluating this triple pattern over the 
> RDF* graph G that contains (only) the aforementioned nested RDF* triple t 
> results in a single solution* mapping m = {?v -> o} that maps variable ?v to 
> o.
> 
> At this point it may be important to emphasize that the papers that I have 
> published so far about the RDF*/SPARQL* approach assume this PG mode, but 
> without explicitly calling it like that. The reason for having made this 
> choice (rather than the alternative, which I now call SA mode as described 
> below) is that my initial perspective on RDF*/SPARQL* has been influenced by 
> discussions with triplestore vendors who were interested in a practical, 
> reification-like feature to capture and to query statement-level annotations. 
> The general intention was that this feature would be used in a way like people 
> use the notion of edge properties in Property Graph databases (if you are not 
> familiar with Property Graphs: an "edge property" is a key-value pair 
> associated with an edge in such a graph). Then, the implicit notion of using 
> PG mode followed from this intention because, in a Property Graph, to assign 
> edge properties to an edge, the edge must exist in the graph.
> 
> Coming back now to the aforementioned two options regarding what the nested 
> triple t could be meant to assert, there can be use cases for which the first 
> option is more suitable than the second one. Adopting the first option is what 
> we may call "using RDF*/SPARQL* in separate-assertions mode" (SA mode). To 
> reiterate, when using this mode, the aforementioned nested RDF* triple t is 
> meant to only make an assertion about the triple t'=(s,p,o) without asserting 
> t'. If we now look again at the aforementioned triple pattern tp and use RDF*/
> SPARQL* in SA mode, evaluating tp over the same RDF* graph G as mentioned 
> before (G consists only of the nested triple t) results in no solution mapping 
> at all---whereas in PG mode the result contains the aforementioned mapping m = 
> {?v -> o}.
> 
> The difference between PG mode and SA mode can also be observed if we consider 
> how RDF* graphs may be converted into standard RDF graphs based on the RDF 
> reification vocabulary: If we assume RDF* is used in SA mode, the set 
> consisting of the following five RDF triples captures the information as 
> captured by our example RDF* triple t (where b is a fresh blank node):
> 
> (b, rdf:type, rdf:Statement)
> (b, rdf:subject, s)
> (b, rdf:predicate, p)
> (b, rdf:object, o)
> (b, p2, o2)
> 
> In contrast, if RDF* is used in PG mode, our example RDF* triple t would have 
> to be converted into the following set of RDF triples, which contains one 
> additional triple (namely, the last one in the following list):
> 
> (b, rdf:type, rdf:Statement)
> (b, rdf:subject, s)
> (b, rdf:predicate, p)
> (b, rdf:object, o)
> (b, p2, o2)
> (s, p, o)
> 
> I hope that these examples clarify now what I mean by using the RDF*/SPARQL* 
> approach in PG mode versus using it in SA mode.
> 
> My main aim of introducing the notion of these modes was to give names to the 
> two possible options as a basis for a discussion about which of them should be 
> selected for the specification of the RDF*/SPARQL* approach.
> 
> Then, on top of that, and as a possible alternative to making the strict 
> decision of selecting only one of them for the spec, I thought it might be 
> useful to cover both of these options in the spec; essentially allowing 
> system/tool developers to decide which of them they support (and indicating so 
> in their documentation) and, in turn, enabling users to employ the systems/
> tools that support the mode that is suitable for their use case. However, 
> baking such a choice into a specification might be a bad idea, and I am 
> interested in peoples' opinions about this.
> 
> By the way, yet another alternative would be to combine the ideas of both 
> modes and enable users to make explicit for each nested triple whether this 
> nested triple is meant to represent an annotation of an asserted triple (like 
> in PG mode) or a statement about a non-asserted triple (like in SA mode). 
> While this alternative would render the notion of having separate modes 
> obsolete, it would require extending the notation in the abstract syntax (and 
> then also in concrete user-focused syntaxes). So, perhaps we should not go 
> there at this point and, instead, first try to get a better understanding of 
> the pros and cons of the two separate modes.
> 
> Therefore, I would like to get your opinions on the following questions.
> What do you think are the merits of the PG mode versus the SA mode, and vice 
> versa?
> Do you have a clear preference for one mode over the other?
> What do you think about introducing both modes in a specification of the RDF*/
> SPARQL* approach?
> 
> Thanks,
> Olaf
>
Received on Thursday, 19 September 2019 15:29:16 UTC