Re: PG mode and SA mode from Jem Rayfield on 2019-09-20 (public-rdf-star@w3.org from September 2019)

From: Jem Rayfield <jem.rayfield@ontotext.com>
Date: Fri, 20 Sep 2019 10:48:23 +0100
To: Olaf Hartig <olaf.hartig@liu.se>
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>, Jeen Broekstra <jeen.broekstra@gmail.com>
Message-ID: <CAPPubhWiofZs97DR68RtZ8C5fXtaubaUz8ZgroKZRQBuYEbyog@mail.gmail.com>
Hello Olaf,

I dont like the idea of modes. This should be closed off, imo.

SA mode with a suitable syntax extension (something akin to Richards
proposal) feels like the best route forward.

GraphDb (Ontotext) will be working with Jeen (RDF4J) so I am pleased that
his thoughts lean towards SA.

Cheers
Jem

On Fri, Sep 20, 2019 at 10:09 AM Olaf Hartig <olaf.hartig@liu.se> wrote:

> Hi,
>
> Jeen, good to know that the summary was clear ;-)
>
> I note that your response to the questions put up in my email goes in
> the same direction as Pat's response.
>
> It would be great if we could also get some feedback from vendors who
> have already implemented RDF*/SPARQL* or are considering to do so
> (Bryan? Steve? Jem?)
>
> Thanks,
> Olaf
>
>
> On Thu, 2019-09-19 at 13:59 +1000, Jeen Broekstra wrote:
> > Thanks for this clear summary Olaf, it's definitely helped me get
> > back up to speed with the thrust of the discussion sofar.
> >
> > I don't think having these two separate modes baked into the language
> > specification is a good idea. Apart from anything else, you would get
> > problems in interoperability between tools. you would have to force
> > all tooling to support both possible interpretations and in addition
> > add a feature to all possible syntax carriers that makes it explicit
> > which interpretation is meant for any given document - and even then
> > there is lot of room for user error.
> >
> > Far simpler to stick to a single interpretation. From my perspective,
> > SA seems the more logical choice here, as it doesn't rule out the
> > alternative: using SA, you can express both intentions, with the only
> > minor downside being that you have to write down the triple twice.
> > But IMO that's hardly a massive problem, it's what compression
> > algorithms are good for.
> >
> > Cheers,
> >
> > Jeen
> >
> > On Thu, Sep 19, 2019 at 4:55 AM Olaf Hartig <olaf.hartig@liu.se>
> > wrote:
> > > Dear all,
> > >
> > > There has been some confusion about the modes I have mentioned in
> > > some of the
> > > other threads (PG mode and SA mode). My aim with this email is to
> > > clear up
> > > this confusion and to start a discussion of these modes. More
> > > precisely, in
> > > this email I will describe again what I mean by these modes and,
> > > thereafter,
> > > ask you to voice your opinions about the modes (whether you prefer
> > > one over
> > > the other, whether both should be possible to use or it would be
> > > better to
> > > continue only with one of them, etc). In the following examples, to
> > > avoid
> > > getting distracted or confused by details of some serialization
> > > syntax I am
> > > going to use the abstract syntax of the RDF* data model and the
> > > SPARQL*
> > > language---i.e., nested triples and nested triple patterns---rather
> > > than a
> > > concrete serialization format such as Turtle*.
> > >
> > > Now, the reason that made me come up with the idea of having two
> > > separate
> > > modes in which RDF*/SPARQL* can be used is the following question:
> > > Given a
> > > nested RDF* triple, say
> > >
> > >  t = ( (s,p,o), p2, o2 ),
> > >
> > > is this nested triple meant to only make an assertion about the
> > > triple
> > > t'=(s,p,o) that it contains (i.e., without asserting t') or is it
> > > meant to
> > > additionally also assert t'? Since both of these options may have
> > > merits, I
> > > think it is worth considering which of them should be used for the
> > > RDF*/
> > > SPARQL* approach or, if possible, whether it is desirable to enable
> > > users to
> > > choose which one they use. So, let's have a more detailed look at
> > > the two
> > > options.
> > >
> > > I start with the second option, which is what we may call "using
> > > RDF*/SPARQL*
> > > in Property Graph mode" (PG mode, for short). Again, when using
> > > this mode, the
> > > aforementioned nested RDF* triple t is meant both to make an
> > > assertion about
> > > the triple (s,p,o) and to also assert that triple, (s,p,o), at the
> > > same time.
> > > To make the implications of this option a bit more clear, consider
> > > the
> > > following nested triple pattern (which may be used in the WHERE
> > > clause of a
> > > SPARQL* query):
> > >
> > >  tp = ( s, p, ?v ) .
> > >
> > > When using RDF*/SPARQL* in PG mode, evaluating this triple pattern
> > > over the
> > > RDF* graph G that contains (only) the aforementioned nested RDF*
> > > triple t
> > > results in a single solution* mapping m = {?v -> o} that maps
> > > variable ?v to
> > > o.
> > >
> > > At this point it may be important to emphasize that the papers that
> > > I have
> > > published so far about the RDF*/SPARQL* approach assume this PG
> > > mode, but
> > > without explicitly calling it like that. The reason for having made
> > > this
> > > choice (rather than the alternative, which I now call SA mode as
> > > described
> > > below) is that my initial perspective on RDF*/SPARQL* has been
> > > influenced by
> > > discussions with triplestore vendors who were interested in a
> > > practical,
> > > reification-like feature to capture and to query statement-level
> > > annotations.
> > > The general intention was that this feature would be used in a way
> > > like people
> > > use the notion of edge properties in Property Graph databases (if
> > > you are not
> > > familiar with Property Graphs: an "edge property" is a key-value
> > > pair
> > > associated with an edge in such a graph). Then, the implicit notion
> > > of using
> > > PG mode followed from this intention because, in a Property Graph,
> > > to assign
> > > edge properties to an edge, the edge must exist in the graph.
> > >
> > > Coming back now to the aforementioned two options regarding what
> > > the nested
> > > triple t could be meant to assert, there can be use cases for which
> > > the first
> > > option is more suitable than the second one. Adopting the first
> > > option is what
> > > we may call "using RDF*/SPARQL* in separate-assertions mode" (SA
> > > mode). To
> > > reiterate, when using this mode, the aforementioned nested RDF*
> > > triple t is
> > > meant to only make an assertion about the triple t'=(s,p,o) without
> > > asserting
> > > t'. If we now look again at the aforementioned triple pattern tp
> > > and use RDF*/
> > > SPARQL* in SA mode, evaluating tp over the same RDF* graph G as
> > > mentioned
> > > before (G consists only of the nested triple t) results in no
> > > solution mapping
> > > at all---whereas in PG mode the result contains the aforementioned
> > > mapping m =
> > > {?v -> o}.
> > >
> > > The difference between PG mode and SA mode can also be observed if
> > > we consider
> > > how RDF* graphs may be converted into standard RDF graphs based on
> > > the RDF
> > > reification vocabulary: If we assume RDF* is used in SA mode, the
> > > set
> > > consisting of the following five RDF triples captures the
> > > information as
> > > captured by our example RDF* triple t (where b is a fresh blank
> > > node):
> > >
> > > (b, rdf:type, rdf:Statement)
> > > (b, rdf:subject, s)
> > > (b, rdf:predicate, p)
> > > (b, rdf:object, o)
> > > (b, p2, o2)
> > >
> > > In contrast, if RDF* is used in PG mode, our example RDF* triple t
> > > would have
> > > to be converted into the following set of RDF triples, which
> > > contains one
> > > additional triple (namely, the last one in the following list):
> > >
> > > (b, rdf:type, rdf:Statement)
> > > (b, rdf:subject, s)
> > > (b, rdf:predicate, p)
> > > (b, rdf:object, o)
> > > (b, p2, o2)
> > > (s, p, o)
> > >
> > > I hope that these examples clarify now what I mean by using the
> > > RDF*/SPARQL*
> > > approach in PG mode versus using it in SA mode.
> > >
> > > My main aim of introducing the notion of these modes was to give
> > > names to the
> > > two possible options as a basis for a discussion about which of
> > > them should be
> > > selected for the specification of the RDF*/SPARQL* approach.
> > >
> > > Then, on top of that, and as a possible alternative to making the
> > > strict
> > > decision of selecting only one of them for the spec, I thought it
> > > might be
> > > useful to cover both of these options in the spec; essentially
> > > allowing
> > > system/tool developers to decide which of them they support (and
> > > indicating so
> > > in their documentation) and, in turn, enabling users to employ the
> > > systems/
> > > tools that support the mode that is suitable for their use case.
> > > However,
> > > baking such a choice into a specification might be a bad idea, and
> > > I am
> > > interested in peoples' opinions about this.
> > >
> > > By the way, yet another alternative would be to combine the ideas
> > > of both
> > > modes and enable users to make explicit for each nested triple
> > > whether this
> > > nested triple is meant to represent an annotation of an asserted
> > > triple (like
> > > in PG mode) or a statement about a non-asserted triple (like in SA
> > > mode).
> > > While this alternative would render the notion of having separate
> > > modes
> > > obsolete, it would require extending the notation in the abstract
> > > syntax (and
> > > then also in concrete user-focused syntaxes). So, perhaps we should
> > > not go
> > > there at this point and, instead, first try to get a better
> > > understanding of
> > > the pros and cons of the two separate modes.
> > >
> > > Therefore, I would like to get your opinions on the following
> > > questions.
> > > What do you think are the merits of the PG mode versus the SA mode,
> > > and vice
> > > versa?
> > > Do you have a clear preference for one mode over the other?
> > > What do you think about introducing both modes in a specification
> > > of the RDF*/
> > > SPARQL* approach?
> > >
> > > Thanks,
> > > Olaf
> > >
>
>

-- 
Jem Rayfield
Chief Architect
Ontotext AD
Received on Friday, 20 September 2019 09:51:18 UTC