Re: PG mode and SA mode from Steve sarsfield on 2019-09-22 (public-rdf-star@w3.org from September 2019)

From: Steve sarsfield <steve.sarsfield@cambridgesemantics.com>
Date: Sun, 22 Sep 2019 19:57:26 -0400
To: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <CAL3k4tU4vFxD1KbaE8VKe284YiZUsJE+Zd2SjMB15cgBdcwR9A@mail.gmail.com>
Cambridge Semantics AnzoGraph signed up enthusiastically to RDF* PG and has
been shipping it for over a year. We did this because it fixes a CRITICAL
flaw in RDF/SPARQL 1.1 in representing real-world data. It is my belief
(and my colleagues agree) that RDF/SPARQL has done poorly in the market, as
compared to LPG graphs primarily due to this flaw.

In an LPG, one can easily say "Jack likes Jill score=9 since=2017-12-15".
In RDF/SPARQL 1.1 that requires reification which is super-cumbersome and
creates huge performance issues for a parallel/distributed database system,
like ours. Your RDF* specification eliminates the cumbersome aspects of RDF
1.1. and SPARQL 1.1 reification. We are grateful!

The original RDF/SPARQL flaw was obvious to us, and our original prospects.
We had planned an extension that wasn't as syntactically or semantically as
elegant. We dropped those plans when we saw your syntax.

In my humble opinion, if RDF* was part of the original SPARQL
specification, we wouldn't see all these proprietary LPG systems. The world
would have focused on RDF/SPARQL in the same manner it did for
relational/SQL many decades ago.

The proposed SA mode, based on what I've read, is really adding another
layer of reasoning whose primary effect will create confusion. Further, it
creates a notion that cannot be represented by LPGs and thus further widens
the gap rather than closing the gap. I suspect it can be better addressed
by an OWL inferencing mechanism. Due to the ambiguities introduced by an SA
mode, we suggest this be deferred for future discussion.

Just getting a system as robust as basic LPGs has taken the RDF/SPARQL
world, so far, about 18 years (7 seven years since 1.1). If we focus on PG
mode, I believe we should be able to get the W3C to standardize far quicker.

Steve Sarsfield
Cambridge Semantics

On Fri, Sep 20, 2019 at 5:51 AM Jem Rayfield <jem.rayfield@ontotext.com>
wrote:

> Hello Olaf,
>
> I dont like the idea of modes. This should be closed off, imo.
>
> SA mode with a suitable syntax extension (something akin to Richards
> proposal) feels like the best route forward.
>
> GraphDb (Ontotext) will be working with Jeen (RDF4J) so I am pleased that
> his thoughts lean towards SA.
>
> Cheers
> Jem
>
> On Fri, Sep 20, 2019 at 10:09 AM Olaf Hartig <olaf.hartig@liu.se> wrote:
>
>> Hi,
>>
>> Jeen, good to know that the summary was clear ;-)
>>
>> I note that your response to the questions put up in my email goes in
>> the same direction as Pat's response.
>>
>> It would be great if we could also get some feedback from vendors who
>> have already implemented RDF*/SPARQL* or are considering to do so
>> (Bryan? Steve? Jem?)
>>
>> Thanks,
>> Olaf
>>
>>
>> On Thu, 2019-09-19 at 13:59 +1000, Jeen Broekstra wrote:
>> > Thanks for this clear summary Olaf, it's definitely helped me get
>> > back up to speed with the thrust of the discussion sofar.
>> >
>> > I don't think having these two separate modes baked into the language
>> > specification is a good idea. Apart from anything else, you would get
>> > problems in interoperability between tools. you would have to force
>> > all tooling to support both possible interpretations and in addition
>> > add a feature to all possible syntax carriers that makes it explicit
>> > which interpretation is meant for any given document - and even then
>> > there is lot of room for user error.
>> >
>> > Far simpler to stick to a single interpretation. From my perspective,
>> > SA seems the more logical choice here, as it doesn't rule out the
>> > alternative: using SA, you can express both intentions, with the only
>> > minor downside being that you have to write down the triple twice.
>> > But IMO that's hardly a massive problem, it's what compression
>> > algorithms are good for.
>> >
>> > Cheers,
>> >
>> > Jeen
>> >
>> > On Thu, Sep 19, 2019 at 4:55 AM Olaf Hartig <olaf.hartig@liu.se
>> <olaf..hartig@liu.se>>
>> > wrote:
>> > > Dear all,
>> > >
>> > > There has been some confusion about the modes I have mentioned in
>> > > some of the
>> > > other threads (PG mode and SA mode). My aim with this email is to
>> > > clear up
>> > > this confusion and to start a discussion of these modes. More
>> > > precisely, in
>> > > this email I will describe again what I mean by these modes and,
>> > > thereafter,
>> > > ask you to voice your opinions about the modes (whether you prefer
>> > > one over
>> > > the other, whether both should be possible to use or it would be
>> > > better to
>> > > continue only with one of them, etc). In the following examples, to
>> > > avoid
>> > > getting distracted or confused by details of some serialization
>> > > syntax I am
>> > > going to use the abstract syntax of the RDF* data model and the
>> > > SPARQL*
>> > > language---i.e., nested triples and nested triple patterns---rather
>> > > than a
>> > > concrete serialization format such as Turtle*.
>> > >
>> > > Now, the reason that made me come up with the idea of having two
>> > > separate
>> > > modes in which RDF*/SPARQL* can be used is the following question:
>> > > Given a
>> > > nested RDF* triple, say
>> > >
>> > >  t = ( (s,p,o), p2, o2 ),
>> > >
>> > > is this nested triple meant to only make an assertion about the
>> > > triple
>> > > t'=(s,p,o) that it contains (i.e., without asserting t') or is it
>> > > meant to
>> > > additionally also assert t'? Since both of these options may have
>> > > merits, I
>> > > think it is worth considering which of them should be used for the
>> > > RDF*/
>> > > SPARQL* approach or, if possible, whether it is desirable to enable
>> > > users to
>> > > choose which one they use. So, let's have a more detailed look at
>> > > the two
>> > > options.
>> > >
>> > > I start with the second option, which is what we may call "using
>> > > RDF*/SPARQL*
>> > > in Property Graph mode" (PG mode, for short). Again, when using
>> > > this mode, the
>> > > aforementioned nested RDF* triple t is meant both to make an
>> > > assertion about
>> > > the triple (s,p,o) and to also assert that triple, (s,p,o), at the
>> > > same time.
>> > > To make the implications of this option a bit more clear, consider
>> > > the
>> > > following nested triple pattern (which may be used in the WHERE
>> > > clause of a
>> > > SPARQL* query):
>> > >
>> > >  tp = ( s, p, ?v ) .
>> > >
>> > > When using RDF*/SPARQL* in PG mode, evaluating this triple pattern
>> > > over the
>> > > RDF* graph G that contains (only) the aforementioned nested RDF*
>> > > triple t
>> > > results in a single solution* mapping m = {?v -> o} that maps
>> > > variable ?v to
>> > > o.
>> > >
>> > > At this point it may be important to emphasize that the papers that
>> > > I have
>> > > published so far about the RDF*/SPARQL* approach assume this PG
>> > > mode, but
>> > > without explicitly calling it like that. The reason for having made
>> > > this
>> > > choice (rather than the alternative, which I now call SA mode as
>> > > described
>> > > below) is that my initial perspective on RDF*/SPARQL* has been
>> > > influenced by
>> > > discussions with triplestore vendors who were interested in a
>> > > practical,
>> > > reification-like feature to capture and to query statement-level
>> > > annotations.
>> > > The general intention was that this feature would be used in a way
>> > > like people
>> > > use the notion of edge properties in Property Graph databases (if
>> > > you are not
>> > > familiar with Property Graphs: an "edge property" is a key-value
>> > > pair
>> > > associated with an edge in such a graph). Then, the implicit notion
>> > > of using
>> > > PG mode followed from this intention because, in a Property Graph,
>> > > to assign
>> > > edge properties to an edge, the edge must exist in the graph.
>> > >
>> > > Coming back now to the aforementioned two options regarding what
>> > > the nested
>> > > triple t could be meant to assert, there can be use cases for which
>> > > the first
>> > > option is more suitable than the second one. Adopting the first
>> > > option is what
>> > > we may call "using RDF*/SPARQL* in separate-assertions mode" (SA
>> > > mode). To
>> > > reiterate, when using this mode, the aforementioned nested RDF*
>> > > triple t is
>> > > meant to only make an assertion about the triple t'=(s,p,o) without
>> > > asserting
>> > > t'. If we now look again at the aforementioned triple pattern tp
>> > > and use RDF*/
>> > > SPARQL* in SA mode, evaluating tp over the same RDF* graph G as
>> > > mentioned
>> > > before (G consists only of the nested triple t) results in no
>> > > solution mapping
>> > > at all---whereas in PG mode the result contains the aforementioned
>> > > mapping m =
>> > > {?v -> o}.
>> > >
>> > > The difference between PG mode and SA mode can also be observed if
>> > > we consider
>> > > how RDF* graphs may be converted into standard RDF graphs based on
>> > > the RDF
>> > > reification vocabulary: If we assume RDF* is used in SA mode, the
>> > > set
>> > > consisting of the following five RDF triples captures the
>> > > information as
>> > > captured by our example RDF* triple t (where b is a fresh blank
>> > > node):
>> > >
>> > > (b, rdf:type, rdf:Statement)
>> > > (b, rdf:subject, s)
>> > > (b, rdf:predicate, p)
>> > > (b, rdf:object, o)
>> > > (b, p2, o2)
>> > >
>> > > In contrast, if RDF* is used in PG mode, our example RDF* triple t
>> > > would have
>> > > to be converted into the following set of RDF triples, which
>> > > contains one
>> > > additional triple (namely, the last one in the following list):
>> > >
>> > > (b, rdf:type, rdf:Statement)
>> > > (b, rdf:subject, s)
>> > > (b, rdf:predicate, p)
>> > > (b, rdf:object, o)
>> > > (b, p2, o2)
>> > > (s, p, o)
>> > >
>> > > I hope that these examples clarify now what I mean by using the
>> > > RDF*/SPARQL*
>> > > approach in PG mode versus using it in SA mode.
>> > >
>> > > My main aim of introducing the notion of these modes was to give
>> > > names to the
>> > > two possible options as a basis for a discussion about which of
>> > > them should be
>> > > selected for the specification of the RDF*/SPARQL* approach.
>> > >
>> > > Then, on top of that, and as a possible alternative to making the
>> > > strict
>> > > decision of selecting only one of them for the spec, I thought it
>> > > might be
>> > > useful to cover both of these options in the spec; essentially
>> > > allowing
>> > > system/tool developers to decide which of them they support (and
>> > > indicating so
>> > > in their documentation) and, in turn, enabling users to employ the
>> > > systems/
>> > > tools that support the mode that is suitable for their use case.
>> > > However,
>> > > baking such a choice into a specification might be a bad idea, and
>> > > I am
>> > > interested in peoples' opinions about this.
>> > >
>> > > By the way, yet another alternative would be to combine the ideas
>> > > of both
>> > > modes and enable users to make explicit for each nested triple
>> > > whether this
>> > > nested triple is meant to represent an annotation of an asserted
>> > > triple (like
>> > > in PG mode) or a statement about a non-asserted triple (like in SA
>> > > mode).
>> > > While this alternative would render the notion of having separate
>> > > modes
>> > > obsolete, it would require extending the notation in the abstract
>> > > syntax (and
>> > > then also in concrete user-focused syntaxes). So, perhaps we should
>> > > not go
>> > > there at this point and, instead, first try to get a better
>> > > understanding of
>> > > the pros and cons of the two separate modes.
>> > >
>> > > Therefore, I would like to get your opinions on the following
>> > > questions.
>> > > What do you think are the merits of the PG mode versus the SA mode,
>> > > and vice
>> > > versa?
>> > > Do you have a clear preference for one mode over the other?
>> > > What do you think about introducing both modes in a specification
>> > > of the RDF*/
>> > > SPARQL* approach?
>> > >
>> > > Thanks,
>> > > Olaf
>> > >
>>
>>
>
> --
> Jem Rayfield
> Chief Architect
> Ontotext AD
>
Received on Monday, 23 September 2019 00:02:14 UTC