Re: A few thoughts on RDF-star, Reification, and Labeled Property Graphs

Bryan,

> This has not worked for 25 years.  The capability has been there all
along.  But it does not convince people and I know of no RDF/SPARQL
database which makes this efficient.

Let's put things into perspective:

Yes, the RDF 1.0 spec has been around for twenty-five years. Turtle has
been around for about thirteen years, and technically only for eleven if
you're counting from the spec. SPARQL has been around for 17 years, but it
had a MAJOR upgrade in 2014, including the introduction of SPARQL UPDATE,
Graphs, and Services, and most of these changes only really saw
implementation within the last five to six years. SHACL was 2017, and is
only JUST beginning to become pervasive in the sector.

We are all people who have been involved with RDF for 20+ years, and so we
tend to know its warts intimately, but for a great number of people and
organizations, it's only JUST beginning to become a thing (I sat in on a
webinar for a GIS based knowledge graph system an hour ago).

Adoption has been slow primarily because there really was no NEED for
knowledge graphs until comparatively recently - we were all very much ahead
of the curve in that regard, and it's been painful. Trust me on this - I
have watched XML from its inception, and it has gone through its XML
winter, and actually there seems to be a bit of a revival going on in that
space as the limitations of JSON (and the prevalence of script kiddies who
wanted to see everything as being like Javascript objects) have given way
to Pythonistas for whom XML was just another format (with its own libraries
that predated JSON and that are still going strong today).

RDF is, for all intents and purposes, frozen. We can define new predicates
and classes into that spec, but honestly, we'd be better off putting those
into their own namespace and calling them something else because we are not
really going to be able to modify rdf by itself given a fairly broad
install base. The XML community tried making base changes to the underlying
SGML syntax, and came away with the realization that it was not feasible
because of that legacy problem, I don't see RDF being any different.

Now, what we are talking about is syntactical changes to Turtle. Turtle is
an RDF expression, but it is an RDF expression that is important because a
Turtle2 could've rolled out without anywhere near as much impact, so long
as you changed the namespace. This happened with named graphs and TRIG.
Named graphs DID require changes to RDF, because it required a fourth term
in the default ntuple.

With reification, on the other hand, we don't need to change anything in
RDF precisely because we have a mechanism for decomposing a triple into its
component parts that goes back 25 years. What we're ultimately talking
about is Turtle syntax, which is an RDF representation but is not the same
thing as RDF. What we're arguing about (and have been for a while) is
whether or not the IRI :r is unique. This is a Turtle problem and the
consensus that seems to be emerging is that Turtle can define a default IRI
that is used in the absence of one being presented.

Personally, I think there is a lot of room to extend Turtle. Turtle is
nothing but syntactical shortcuts and always has been. If you create a
Turtle-star, then you can create those syntactical shortcuts for
reification, and can even represent hypergraphs as intermediate data
structures. That's the point I was trying to get at. You roll it out under
a different iteration, possibly with a ttl2 extension, incorporate trig
changes, and pass it to the developers for the next iteration of these
tools. If history is any guide, we're talking about seeing implementations
even for what we are ostensibly calling RDF-Star likely making their way
into platforms like Jena within a year, and to commercial products within
three to four.

My argument is that if you're going to push for changes in the Turtle
interpreters, then it's best to identify ALL of the changes that make sense
at this point in time, even if it involves rechartering the working group.

Now, I'm the new kid on the block as far as this group goes and I'm a
non-voting member, so I don't know if this will make any difference
whatsoever, but I've fought these battles in the XML world for a long time
(about 25 years, come to think of it). I don't see them being that much
different in the RDF world. While we're at it, we can look at the LPG world
for guidance, and make suggestions for SPARQL 2 operators, consistent
extension mechanisms, and so forth.

One final point, while I'm on my soapbox. I spend a lot of time in the LLM
space right now. What I see is that by the time we actually come to a
consensus on Turtle reification syntax, we are MUCH more likely to see the
vast bulk of SPARQL queries being written by LLMs, not human beings. I'm
not a huge fan of much of the AI hype, but I do generation of SPARQL from
LLMs  daily NOW, and so far it's been pretty good (even given the
comparative infancy of codegen) once you feed it SHACL schemas. The same
holds true for OpenCypher, FWIW.

Okay, stepping off my soapbox.





*Kurt Cagle*
Editor in Chief
The Cagle Report
kurt.cagle@gmail.com
443-837-8725 <http://voice.google.com/calls?a=nc,%2B14438378725>


On Tue, Apr 9, 2024 at 2:02 PM Thompson, Bryan <bryant@amazon.com> wrote:

> This has not worked for 25 years.  The capability has been there all
> along.  But it does not convince people and I know of no RDF/SPARQL
> database which makes this efficient.
>
>
> >  You can have interoperability with LPGs right now, regardless, it's
> just very verbose when expressed in RDF 1.1. My argument is that we're
> talking about changes to Turtle. Just as we use (:a :b :c) as a shortcut
> for an RDF linked list expansion, the changes that we are making here are
> Turtle syntax related.
>
>
> Bryan
>
>
> ------------------------------
> *From:* Kurt Cagle <kurt.cagle@gmail.com>
> *Sent:* Tuesday, April 9, 2024 1:51:32 PM
> *To:* Thompson, Bryan
> *Cc:* James Anderson; RDF-star Working Group; Lassila, Ora; Bebee, Brad;
> Schmidt, Michael; Hartig, Olaf; Williams, Gregory
> *Subject:* RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and
> Labeled Property Graphs
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> You can have interoperability with LPGs right now, regardless, it's just
> very verbose when expressed in RDF 1.1. My argument is that we're talking
> about changes to Turtle. Just as we use (:a :b :c) as a shortcut for an RDF
> linked list expansion, the changes that we are making here are Turtle
> syntax related.
>
> Note: There have been any number of graph databases that failed, including
> HypergraphDB, both because of difficulty in use and in the fact that there
> was simply not enough need to justify them when they were released.  The
> fact that we are bumping against more and more use cases where the need
> does exist suggests rather that the market is finally recognizing the need.
>
>
> *Kurt Cagle*
> Editor in Chief
> The Cagle Report
> kurt.cagle@gmail.com
> 443-837-8725 <http://voice.google.com/calls?a=nc,%2B14438378725>
>
>
> On Tue, Apr 9, 2024 at 1:35 PM Thompson, Bryan <bryant@amazon.com> wrote:
>
>> The issue with the example below is that it requires an explicit
>> statement model.  This violates the "effective and efficient" criteria.
>> The main and original outcome of the discussions with Olaf was that RDF
>> reification could be seen as an interchange syntax and the database could
>> have a license to model the information differently.
>>
>>
>> However, people avoid explicit RDF reification like the plague precisely
>> because it leads them into the territory with wondering about "what does it
>> mean" and "do I really need to use 5 statements to make one statement".  5x
>> the data.  5x slower.  etc.  This is a very real perception among potential
>> consumers of RDF.
>>
>>
>> Hypergraphs are an attractive nuisance in this regard.  Talk about scope
>> creep for the charter :-).
>>
>>
>> I have nothing against hypergraphs.  My proposal would be to model it all
>> quite explicitly as illustrated in (F).  (F) is clearly a hyperedge.  If
>> you do this, you can easily model hypergraphs as well as typed role-based
>> connections and we can perhaps create another community group to explore
>> that.  If we can get some transaction around use cases for hypergraphs in a
>> community group, that could certainly drive standardization.  I will note
>> in passing that HyperGraphDB is dead.  It had some traction for a while,
>> but clearly failed to secure sufficient interest to drive other platforms
>> to modeling hypergraphs.
>>
>>
>> In contrast to "hypergraphs now", we have a very clear utility to RDF/LPG
>> interoperability now.  If we make the right decision now, we can extend
>> that to bring the benefits of RDF to LPG users and perhaps the rising tide
>> of LPG uses to RDF.
>>
>>
>> > (F) [from the original letter]
>>
>> > :s1 :p :o {| :b1 | :ep 1 |}
>> > :s2 :p :o {| :b2 | :ep 2 |}
>> > :b1 :partOf :b
>> > :b2 :partOf :b
>>
>>
>> Bryan
>>
>>
>> ------------------------------
>> *From:* Kurt Cagle <kurt.cagle@gmail.com>
>> *Sent:* Tuesday, April 9, 2024 1:21:32 PM
>> *To:* Thompson, Bryan
>> *Cc:* James Anderson; RDF-star Working Group; Lassila, Ora; Bebee, Brad;
>> Schmidt, Michael; Hartig, Olaf; Williams, Gregory
>> *Subject:* RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and
>> Labeled Property Graphs
>>
>>
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>>
>> Making an observation here:
>>
>> If you have an :S :p :O model where :S and :O are (hypothetical) sets (of
>> anything, including graphs), this can be expressed as :s rdf:member :S, :o
>> rdf:member :O , which is the definition of a hypergraph.  (I discuss this
>> at length in
>> https://www.linkedin.com/pulse/rethinking-hypergraphs-kurt-cagle-n2oec/).
>>
>> RDF by itself does not support hypergraphs, but RDF *with* the
>> interpretation that :S and :O are pointers to sets ( in various ways)
>> certainly does support hypergraphs.
>>
>> This extends beyond reification; it's just that the reification operator
>> has introduced hypergraphs.
>>
>> I'd also argue that we're actually not talking about RDF here (star or
>> otherwise) but Turtle. RDF doesn't care - it can represent both kinds of
>> structures just fine, so long as you allow for the notion that you're going
>> to be dealing with pointers to sets.
>>
>> I'd also argue that the RDF vs LPG argument is a bit of a red herring.
>> You can represent an LPG (and let's be honest, call it Neo4J) readily in
>> RDF without the syntact sugar:
>>
>> (Basic Turtle here)
>> :Joe :married :Alice .
>> :r1 rdf:subject :Joe .
>> :r1 rdf:predicate :married .
>> :r1 rdf:object :Alice .
>> :r1 :dateStart "2014-06-04"^^xsd:date .
>> :r1 :dateEnd "2020-12-01"^^xsd:date .
>> :r1 a MaritalRelationship: .
>>
>> :Joe :married :Jane .
>> :r2 rdf:subject :Joe .
>> :r2 rdf:predicate :married .
>> :r2 rdf:object :Jane .
>> :r2 :dateStart "2021-03-02"^^xsd:date .
>> :r2 :dataEnd "2024-04-01"^^xsd:date .
>> :r2 a MaritalRelationship: .
>>
>>
>> That doesn't change regardless of the syntactical sugar. If I want to
>> define a hypergraph, this could be done as:
>> :r1 rdf:member :R .
>> :r2 rdf:member :R .
>> :R owl:sameAs :JoeMarriages .
>>
>> SPARQL then sees this as:
>> ?r rdf:member :R.
>> ?r rdf:subject ?rs.
>>
>> I think the biggest problem is that we have no native set operator in
>> Turtle. Let's say we have an operator like [[ ]] which represents a set,
>> then:
>>
>> [[<<:Joe :married :Alice>> | :dateStart "2014-06-04"; dateEnd
>> "2020-12-01" |
>> <<:Joe :married :Jane>> | :dateStart "2021-03-02"; dateEnd "2024-04-01"|
>> ]]  :listOf :JoesMarriages .
>>
>> becomes feasible.
>>
>> I agree with Peter that LPGs should be backwards compatible with Turtle
>> (you can express an LPG in Turtle), but that Turtle should not be limited
>> by where LPGs are.
>>
>> *Kurt Cagle*
>> Editor in Chief
>> The Cagle Report
>> kurt.cagle@gmail.com
>> 443-837-8725 <http://voice.google.com/calls?a=nc,%2B14438378725>
>>
>>
>> On Tue, Apr 9, 2024 at 11:03 AM Thompson, Bryan <bryant@amazon.com>
>> wrote:
>>
>>>
>>> One would do this because the future relevance of RDF is at stake.  It
>>> would be an extreme disservice to RDF to introduce a more conceptually
>>> complicated model of RDF Reification, and implicit grouping via the same
>>> identifier is a more conceptually complicated model.  I have been
>>> involved in this via RDF-star since 2012 when I got Olaf interested in this
>>> problem and via "Reification Done Right" since 2008 and via other
>>> activities back to 1999 with a critique of the semantic web as being unable
>>> to handle uncertain and messy data, which is what we have in the real world.
>>> To my thinking, the conceptual difficulties of RDF reification have been a
>>> major reason why LPG had an opportunity in the graph standards market when
>>> we had solid detailed existing standards.  LPG makes edge properties
>>> simple.  And edge properties are a critical -- the number one critical --
>>> use case for RDF Reification.  There are to be certain other valuable use
>>> cases, but this is frankly table stakes for graph standards.  RDF has a
>>> *lot* of other benefits, but it falls down on the handling of edge
>>> properties.
>>>
>>>
>>> To me, this is a question of basic relevance of RDF to the future.
>>> Getting this wrong will slam the door closed on RDF.  Getting it right will
>>> make it possible to breath continued and new life into RDF.
>>>
>>>
>>> The issue for uptake and use by the broad graph community is not about
>>> having the "more capable model".  What RDF needs is a model which provides
>>> a clear, effective and efficient semantics for edge properties and ...
>>> extending that ... for statements about statements.
>>>
>>>
>>> Bryan
>>> ------------------------------
>>> *From:* James Anderson <anderson.james.1955@gmail.com>
>>> *Sent:* Monday, April 8, 2024 4:49:05 PM
>>> *To:* RDF-star Working Group
>>> *Cc:* Lassila, Ora; Thompson, Bryan; Bebee, Brad; Schmidt, Michael;
>>> Hartig, Olaf; Williams, Gregory
>>> *Subject:* RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and
>>> Labeled Property Graphs
>>>
>>> CAUTION: This email originated from outside of the organization. Do not
>>> click links or open attachments unless you can confirm the sender and know
>>> the content is safe.
>>>
>>>
>>>
>>> good morning;
>>>
>>> > On 8. Apr 2024, at 23:42, Lassila, Ora <ora@amazon.com> wrote:
>>> >
>>> > The Amazon Neptune team is committed to lowering the barriers to the
>>> adoption of graph databases and graph-based computing. Our customers
>>> benefit when we reduce the conceptual and technological gap between RDF
>>> graphs and Labeled Property Graphs (LPGs). Over the last several years we
>>> have seen LPGs increase their popularity thanks to easy-to-understand and
>>> easy-to-use features, even when RDF offers more sophisticated features such
>>> as (for example) easy graph merging, federated queries, and expressive
>>> schema languages. The importance and relevance of interoperability between
>>> RDF and LPG was established several years ago at the W3C workshop on Web
>>> Standardization for Graph Data (Creating Bridges: RDF, Property Graph and
>>> SQL) [1]. While its origins are much older, the RDF-star Community Group
>>> was established in the wake of this event. We believe that improving the
>>> ability for RDF and LPG graphs to interoperate will benefit the entire
>>> graph community.
>>> >
>>> > As we see it, the most critical outcomes of the work of the RDF-star
>>> working group should include:
>>> >     • Efficient RDF support for “edge properties”, including the
>>> ability to have different property sets for otherwise identical edges (LPGs
>>> do not have the restriction RDF has where triples are unique in a graph).
>>> >     • Simple and clear RDF support for statements about statements
>>> (supporting provenance mechanisms and other identified use cases).
>>> >     • Laying the groundwork for interoperability “in the data” between
>>> RDF and LPG languages (e.g., a single database that can expose both LPG and
>>> RDF based query languages over the same data).
>>>
>>> this third outcome, while valuable, is not one of the chartered tasks.
>>> is it the intent of this note to suggest that the charter should be
>>> extended?
>>>
>>> >  The alignment of features and capabilities between RDF and LPGs is
>>> possible if there are no fundamental incompatibilities between the two
>>> graph models. The RDF-star Working Group’s original goal, an easy mechanism
>>> for making “statements about statements”, would make the gap between the
>>> two models significantly smaller; statements about statements are a feature
>>> similar to “edge properties” in LPGs, the lack of which in RDF we often
>>> hear cited as the reason users choose LPGs. On the other hand, the current
>>> proposal the WG is entertaining, the “single reifier multiple triples”
>>> model, has no clear counterpart in LPGs, renders the two graph models even
>>> more different than they are today, adds significant complexity (there are
>>> more expressive alternatives with simpler semantics), and makes it even
>>> more difficult to understand RDF reification rather than offering a
>>> conceptually simple framework.
>>> >
>>> > Limiting reifiers to single statements – and classifying scenarios
>>> with a single reifier for multiple statements as “non-well formed” – will
>>> bring the greatest benefit to the graph community at large. On the other
>>> hand, allowing a single reifier for multiple statements will make it very
>>> difficult to align the LPG and RDF models. Please see the examples below.
>>>
>>> this suggests to restrict the more capable model to conform with the
>>> limitations of the less capable model, not as a matter of usage or a
>>> conventional profile, but as a required characteristic.
>>> why would one do this?
>>>
>>> this discussion conflates two aspects of the model:
>>> - the cardinality of the identified statements
>>> - the cardinality of the annotations on the identified entity
>>>
>>> it should be possible to consider them independently.
>>>
>>> there is nothing in the examples or commentary below which substantiates
>>> any argument beyond that a profile would be expeditious.
>>>
>>> >
>>> > We strongly believe that the continued relevance of RDF depends on
>>> establishing interoperability with LPGs. As stated above, RDF brings some
>>> tremendous advantages, and we are committed to bringing these advantages to
>>> the community of LPG users as well. We believe that this reflects the
>>> spirit of the W3C workshop on Web Standardization for Graph Data and
>>> resonates with inputs from some other members of the working group.
>>> >
>>> > [1] https://www.w3.org/Data/events/data-ws-2019/
>>> >  Examples:
>>> >
>>> > # (A) An LPG edge with a single edge property.
>>> > (s) - [p {ep: 1}] → (o)
>>> >
>>> > # (B) An interpretation of that in an SPOI model (where I is a
>>> statement identifier).
>>> > # The OneGraph model is based on such SPOI tuples.
>>> > s p o :sid1
>>> > :sid1 ep 1 :sid2
>>> >
>>> > # (C) An RDF-star expression consisting of an asserted triple and a
>>> statement about
>>> > # that.
>>> > :s :p :o {| :ep 1 |} # with an anonymous identifier for the (s p o)
>>> statement.
>>> >
>>> > # (D) The RDF interpretation of that RDF-star expression.
>>> > :s :p :o . # The asserted triple.
>>> > _:b rdf:reifies <<( :s :p :o )>> . # A reifier for that triple.
>>> > _:b :ep 1 . # Using that reifier to make an assertion about a triple
>>> occurrence.
>>> >
>>> > # Note that the LPG example (A), the SPOI interpretation (B), and the
>>> RDF model (D)
>>> > # can be handled as exactly the same data within possible database
>>> implementations
>>> > # such as proposed by Souri or by a OneGraph implementation.  The case
>>> where _:b is
>>> > # replaced by an IRI can also be handled under LPG, 1G, etc.
>>> >
>>> > # Now, let us look at the case where different statements are assigned
>>> the same
>>> > # reifier:
>>> >
>>> > # (E) Same reifier used in two expressions about different triples.
>>> > :s1 :p :o {| :b | :ep 1 |} # a statement about a statement with
>>> reifier ":b".
>>> > :s2 :p :o {| :b | :ep 2 |} # a statement about a different statement,
>>> same reifier.
>>> >
>>> > # This last case (E) has no sensible interpretation under LPG.
>>> >
>>> > # If we accept a constraint that using the same reifiers for different
>>> TripleTerms
>>> > # is not well-formed, then we can maintain a consistent interpretation
>>> with LPG edge
>>> > # properties.  Further, we can use explicit modelling to group
>>> statements and retain
>>> > # transparency about the functional or semantic roles in such
>>> groupings.
>>> >
>>> > # (F) Two statements are being grouped by an explicit semantic
>>> relationship (:partOf).
>>> > :s1 :p :o {| :b1 | :ep 1 |}
>>> > :s2 :p :o {| :b2 | :ep 2 |}
>>> > :b1 :partOf :b
>>> > :b2 :partOf :b
>>> >
>>> > # We submit that this explicit modelling is more useful and preserves
>>> the alignment
>>> > # with LPG and RDF which has such great value to the world wide graph
>>> community.
>>> >   --
>>> > Dr. Ora Lassila
>>> > Principal Technologist, Amazon Neptune
>>>
>>>
>>> ---
>>> james anderson | james@dydra.com | https://dydra.com
>>>
>>>
>>>

Received on Tuesday, 9 April 2024 21:59:24 UTC