- From: Olaf Hartig <olaf.hartig@liu.se>
- Date: Wed, 18 Sep 2019 18:53:47 +0000
- To: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Dear all, There has been some confusion about the modes I have mentioned in some of the other threads (PG mode and SA mode). My aim with this email is to clear up this confusion and to start a discussion of these modes. More precisely, in this email I will describe again what I mean by these modes and, thereafter, ask you to voice your opinions about the modes (whether you prefer one over the other, whether both should be possible to use or it would be better to continue only with one of them, etc). In the following examples, to avoid getting distracted or confused by details of some serialization syntax I am going to use the abstract syntax of the RDF* data model and the SPARQL* language---i.e., nested triples and nested triple patterns---rather than a concrete serialization format such as Turtle*. Now, the reason that made me come up with the idea of having two separate modes in which RDF*/SPARQL* can be used is the following question: Given a nested RDF* triple, say t = ( (s,p,o), p2, o2 ), is this nested triple meant to only make an assertion about the triple t'=(s,p,o) that it contains (i.e., without asserting t') or is it meant to additionally also assert t'? Since both of these options may have merits, I think it is worth considering which of them should be used for the RDF*/ SPARQL* approach or, if possible, whether it is desirable to enable users to choose which one they use. So, let's have a more detailed look at the two options. I start with the second option, which is what we may call "using RDF*/SPARQL* in Property Graph mode" (PG mode, for short). Again, when using this mode, the aforementioned nested RDF* triple t is meant both to make an assertion about the triple (s,p,o) and to also assert that triple, (s,p,o), at the same time. To make the implications of this option a bit more clear, consider the following nested triple pattern (which may be used in the WHERE clause of a SPARQL* query): tp = ( s, p, ?v ) . When using RDF*/SPARQL* in PG mode, evaluating this triple pattern over the RDF* graph G that contains (only) the aforementioned nested RDF* triple t results in a single solution* mapping m = {?v -> o} that maps variable ?v to o. At this point it may be important to emphasize that the papers that I have published so far about the RDF*/SPARQL* approach assume this PG mode, but without explicitly calling it like that. The reason for having made this choice (rather than the alternative, which I now call SA mode as described below) is that my initial perspective on RDF*/SPARQL* has been influenced by discussions with triplestore vendors who were interested in a practical, reification-like feature to capture and to query statement-level annotations. The general intention was that this feature would be used in a way like people use the notion of edge properties in Property Graph databases (if you are not familiar with Property Graphs: an "edge property" is a key-value pair associated with an edge in such a graph). Then, the implicit notion of using PG mode followed from this intention because, in a Property Graph, to assign edge properties to an edge, the edge must exist in the graph. Coming back now to the aforementioned two options regarding what the nested triple t could be meant to assert, there can be use cases for which the first option is more suitable than the second one. Adopting the first option is what we may call "using RDF*/SPARQL* in separate-assertions mode" (SA mode). To reiterate, when using this mode, the aforementioned nested RDF* triple t is meant to only make an assertion about the triple t'=(s,p,o) without asserting t'. If we now look again at the aforementioned triple pattern tp and use RDF*/ SPARQL* in SA mode, evaluating tp over the same RDF* graph G as mentioned before (G consists only of the nested triple t) results in no solution mapping at all---whereas in PG mode the result contains the aforementioned mapping m = {?v -> o}. The difference between PG mode and SA mode can also be observed if we consider how RDF* graphs may be converted into standard RDF graphs based on the RDF reification vocabulary: If we assume RDF* is used in SA mode, the set consisting of the following five RDF triples captures the information as captured by our example RDF* triple t (where b is a fresh blank node): (b, rdf:type, rdf:Statement) (b, rdf:subject, s) (b, rdf:predicate, p) (b, rdf:object, o) (b, p2, o2) In contrast, if RDF* is used in PG mode, our example RDF* triple t would have to be converted into the following set of RDF triples, which contains one additional triple (namely, the last one in the following list): (b, rdf:type, rdf:Statement) (b, rdf:subject, s) (b, rdf:predicate, p) (b, rdf:object, o) (b, p2, o2) (s, p, o) I hope that these examples clarify now what I mean by using the RDF*/SPARQL* approach in PG mode versus using it in SA mode. My main aim of introducing the notion of these modes was to give names to the two possible options as a basis for a discussion about which of them should be selected for the specification of the RDF*/SPARQL* approach. Then, on top of that, and as a possible alternative to making the strict decision of selecting only one of them for the spec, I thought it might be useful to cover both of these options in the spec; essentially allowing system/tool developers to decide which of them they support (and indicating so in their documentation) and, in turn, enabling users to employ the systems/ tools that support the mode that is suitable for their use case. However, baking such a choice into a specification might be a bad idea, and I am interested in peoples' opinions about this. By the way, yet another alternative would be to combine the ideas of both modes and enable users to make explicit for each nested triple whether this nested triple is meant to represent an annotation of an asserted triple (like in PG mode) or a statement about a non-asserted triple (like in SA mode). While this alternative would render the notion of having separate modes obsolete, it would require extending the notation in the abstract syntax (and then also in concrete user-focused syntaxes). So, perhaps we should not go there at this point and, instead, first try to get a better understanding of the pros and cons of the two separate modes. Therefore, I would like to get your opinions on the following questions. What do you think are the merits of the PG mode versus the SA mode, and vice versa? Do you have a clear preference for one mode over the other? What do you think about introducing both modes in a specification of the RDF*/ SPARQL* approach? Thanks, Olaf
Received on Wednesday, 18 September 2019 18:54:39 UTC