Re: A few thoughts on RDF-star, Reification, and Labeled Property Graphs

Hi Bryan,


thank you for the long response. We don’t seem to get much nearer together however. I still miss the technical details that make you say that multi-edge reification is not efficient/practical, and I differ with you on the merits of such a construct in practice.

> On 16. Apr 2024, at 18:07, Thompson, Bryan <bryant@amazon.com> wrote:
> 
> Well, it's most definitely not about the Neptune implementation :-).  I have implemented a number of SPARQL engines over the years (six, including on GPUs, based on object graphs, based on columnar stores, based on nested triples for RDR/RDF-star, based on quads, etc, etc.), I am aware of the literature on dozens of other SPARQL databases, and I know of no (practical, efficient) design in which interoperability in the data may be achieved with single reifier multiple triples. 

Actually, when Brad Bebee presented the Amazon Neptune approach at the Berlin Graph WS in 2019 he showed that the fourth column carries an identifier that can be reused in multiple rows to address multiple statements at once. What else is there that I’m not aware of?

> As Ora pointed out, the construction lacks any correspondence to the concepts in LPG and in particular it breaks the ability to clearly model edge properties in an interoperable fashion. It is not an issue with any specific implementation. 

Well, it doesn’t break anything for Dydra [0], an RDF datastore service based on SPARQL datasets (whose creator datagraph I represent in the WG), and we would love to hear in more detail why you think it does, and where specifically the interoperability issue lies - technically/implementation-wise, query-wise, intuition/interpretation-wise, etc.

> It seems to me that there is a misplaced concern that same reifier multiple triples is somehow a good thing.  

That is certainly my position.

> It is certainly a place you can go with the original notion of reified statement models, but I suggest it is not a good destination.  I would have to retract my earlier agreement that this is a hyperedge.  It is in fact simple a bundle of triples.  A hyper edge does not allow multiple link types, right, just multiple sources and multiple targets?  But this same reifier, multiple triples design would allow multiple link types.  It is in fact closer to named graphs, which are also just bundles of triples.  If you want hyper edges, you need to have a set of resources for the subject and object positions for some link type. It seems that this is more directly modeled by creating those sets and then connecting them with a single edge.  That is simple, succinct, and you can already do this with RDF.

I didn’t follow the hyperedge part of this thread, sorry, but what’s wrong with "simple a bundle of triples"? Directly modelling those sets - well, you know how attractive that is. You know how well liked RDF standard reification is because of triple bloat. You probably also know that edge annotations can also be modelled as n-ary relations. In fact, everything can! And RDF has them! Let’s not change anything!(TM)

> I think we can readily motivate a use case for edge properties.  That is a basic capability that you obtain with statements about statements.  Another core use case would be statement level provenance.  I think it would be useful if someone please expand on use cases (ideally motivated by business problems that would be solved, but this is of course a bias) for single identifier multiple triples?  Per above, I can't see "hyperedges" as a motivating use case for for the proposed feature for two reasons.  First, it's a bundle of triples, not a hyperedge.  Second, "hyperedge" is not a use case.  It is just a description.  To my mind, use cases need to describe instances of some common problem.  By supporting the use case, we enable a wide range of similar customer stories.

A use case for grouping… you’re not the only one who asked for them, but honestly: everytime I hear that I just shake my head. "I don’t need a weatherman to know which way the wind blows".

I see a major "business problem" for multi-edge reification in RDF/LPG interop. In my understanding LPGs are different from RDF in predominantly two ways:
A) they allow to annotate edges (we all are very aware of that)
B) they compartmentalize the graph into "attributed objects"
Entry B is your business case. Each such attributed objects is, in RDF, a bunch of triples, indistinguishable from "edges" between "attributed objects", and with very blurry edges to other "attributed objects". How do you represent that in RDF? How do you ensure that a user accustomed to the cosy primitives of LPG, and those nice visualizations build on top of them, doesn’t immediatly feel alienated by the messiness of a totally unstructered graph like the one RDF provides? You’ll want to replicate those boundaries in RDF, and there’s little hope to do that reliably other than actually describing each object, i.e. annotating all triples (multiple triples of course) with the "attributed object" to which they belong. And why would you want to mint a new reifier for each of those membership relations?

Apart from that specific RDF/LPG interop grouping use case, grouping is such a basic KR activity that it really doesn’t need a use case. It is probably _the one_ thing that doesn’t need a use case. Why did RDF provide lists from the get go, among them the rdf:Bag with explicitly unstructured semantics? Why did quad stores and named graphs come into being, without any support by the RDF specification, and why are they so popular compared to pure triple stores? There is your business problem. 

And NO, we can’t use named graphs for grouping, because A) the WG in its wisdom has ruled them out for the purpose of statement annotation (and as well for any other sound usage), and B) it makes for an exceedingly bad user experience if one has to query for an attribute in two different ways because one doesn’t know upfront if it’s encoded per statement or per a group of statements - and very often one just can’t know that in advance in any source of noteworthy size and complexity. 
Provenance is the poster child use case for named graphs AND for RDF-star. How would you query for provenance in an unfamiliar data source? You can’t know, so you have to use the FROM clause, querying for graph attributes AND you have to use the SPARQL-star syntax. I mean, honestly: how broken is that as user experience?! IMO it’s catastrophic, and it sure won’t help RDF adoption. 
That’s why I was so adamant to base statement annotation on named graphs: we already have them, we would just have to extend them a bit. But no… So the second best option IMHO is to leave named graphs to application purposes, any out-of-band arrangements, and let the soundly standardized  annotation mechanism annotate single and multiple triples alike. It will take many years of pain to work this into the installed base of data, but it’s still better than waiting forever (or until RDF 2.0 in 2038, provided RDF is still a thing by then) for a solution to this basic problem.

> The "well-formedness" condition as I understand it would not limit the ability to a process to accept a graph that was not well formed, but it would not standardize such behavior as normative.  I view some such mechanism as a necessary compromise that can be used to build "interesting" systems with efficient processing of statements about statements.  In addition, it helps to enable interoperability with LPG in the data (same database, multiple query languages).

"Not normative" in the standards-orientied world of RDF translates to "irrelevant". You can do whatever you want in the comfort of your private data pool, but you can’t share it, you can’t expect and rely on others to interpret it correctly, and that makes it … irrelevant, a pet project or a research prototype at best. So if that constraint on reifier-cardinality gets anywhere near the core of the spec, the opportunity is gone. A best-practice on the other hand, and an annotation syntax that promotes single edge reification (which it certainly does), will go a long way in nudging users (LPG-users at least) into a direction where they are not "irritated" or "worried" from multiplicity. 
But you still don’t provide any details on that general efficiency problem and those query issues (with Gremlin and Cypher I suppose), and as I said I’m very interested!

> I suggest that it would be a good idea to actively solicit input from the builders of scalable SPARQL engines.  Vendors of those products definitely have a vested interest in the support of interoperability with LPG.  The proposed direction of RDF-star will not enable that interoperability.  To me, this makes the entire effort somewhat beside the point.

In this WG I represent one of those builders - Dydra [0], which provides a scalable and performant SPARQL dataset engine as a service to industry customers - and support for multi-edge annotations causes it no trouble. I invite you to lead that activity and solicit such input. I’m honestly all ears to learn what problems there are that you see or at least expect for sure. But I have to warn you already: any increase in complexity on the back end has to be weighed against A) the increase in complexity for users having to cope with different annotation mechanisms for single and multiple statements or B) the cost of having to model grouping relations explicity.

> We should be building a bridge here, not creating more isolation for RDF. 

A bridge doesn’t expect the one bank to move closer to the other, artiicially constraining its expressivity, hurting its semantics, breaking its basic modelling principles. Multi-edge reification is natural, semantically even imperative in RDF and it fully subsumes the LPG-style single-edge reification. It is not hard to translate to LPG single edge annotations. What’s not to like? 

> In my opinion, the current proposal fails to illuminate and clarify why and how RDF can support statements about statements.  It introduces an odd mechanism that is similar to named graphs with implicit groupings created by a shared identifier.

It may seem odd because it is unexpected, and it was for me a few weeks ago, but think about it a bit - it is actually very natural: one statement has a meaning, which can be reified and annotated. Multiple statement have a meaning, which can be … etc. 

The only problem that I acknowledge is disambiguating if an annotation refers to each statement in a set or to the set as a whole. But that is not a huge problem in practice, judging by 20 years of experience with lists, and it can be fixed e.g. with another annotation (on the annotation itself).

> The basis for this mechanism is the syntactic ability to express statement models with the same identifier.

Getting hung up on syntax is always dangerous in RDF-land ;-)

> The last 25 years have been a journey of people avoiding such statement models because they are not efficient in time or space with a variety of different mechanisms proposed over the years to have a simple mechanism for statements about statements. 

A lot has happened in the last 25 years, and the problem has been approached from many different directions. It is a hard problem because it has many different aspects, and there has yet to appear a principled approach that takes the whole complexity into account. Most, just like RDF-star, hoped to find the magic bullet, the one magic trick that does it all, and either ignored the aspects they can’t satisfy or dismissed them with a "sorry, but you can’t have _everything_, right?". RDF-star is no exception, but artificially constraining it sure won’t help that. A one trick pony to bridge the gap between LPG and RDF might help LPG (Godspeed to them!), but it sure won’t help RDF. 

What I would actually find useful is an "application" profile for RDF where the world is closed, names are unique and referentially opaque, lists are finite, which focuses on syntax instead of meaning, etc - where all the integration-focused complexity is gone, except IRIs and bnodes, because those are essential. That could really foster interoperability with LPGs, and I guess it would in general help RDF adoption massively. But IMO it would still require a mechanism that allows to annotate single and multiple statements with the same primitive, queryable with the same syntax. Of course this is out of charter, but it is the way to go to solve the usability problems you and Ora see. RDF is certainly hard to understand and may seem unnecessarily complex for people with a database-mindest and no intent to support more than one view on the world. But they won’t be pacified by the introduction of just one syntactic primitive, they will still not feel comfortable with the rest, and they will - rightfully - ask why there are two different paradigms at work. This will only make things worse.

> This goal of supporting statements about statements is at the core of the charter.  

But saving RDF from itself is not.

Thomas



[0] https://docs.dydra.com/

> Bryan
> 
> 
> From: Thomas Lörtsch <tl@rat.io>
> Sent: Tuesday, April 16, 2024 1:57:06 AM
> To: Thompson, Bryan; public-rdf-star-wg@w3.org; Gregg Kellogg; Lassila, Ora
> Cc: Niklas Lindström; Kurt Cagle; James Anderson; Bebee, Brad; Schmidt, Michael; Hartig, Olaf; Williams, Gregory; Andy Seaborne
> Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
>  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Am 16. April 2024 00:56:15 MESZ schrieb "Thompson, Bryan" <bryant@amazon.com>:
> >We tried to bring this out in the original email message at the start of this thread.
> 
> ... which has been criticized by me and others for not providing much detail. Okay for an initial email to start a discussion, but not enough for an informed discussion...
> 
> >I am talking about efficiency,
> 
> ... which has led people to wonder if there are implementation issues in Neptune, but Ora assured us that that is not what worries him. So what is that efficiency thing? How is verbose single-edge annotation more efficient than grouped multi-edge annotation?
> 
> > sensible interpretations which minimize confusion
> 
> And this is where I really can't follow. Annotating multiple edges at once can either be the result of an optimization - saving triples, simplifying authoring and reading - or a necessity stemming from the semantics - as otherwise we would introduce referential opacity which definitely would introduce confusion and unsensible interpretations.
> Either way, the annotation refers to all statemements reified by a multi-edge reifier. What is there not to understand, to misinterpret and confuse? As discussed elsewhere already, pedantic interpretations may question if a multi-edge annotation refers to each edge ir only the group as a whole, but that is really a pretty niche concern. It can be dusambiguated by an extra statement if the need arises, and RDF has dodged the same issue for decades w.r.t. lists.
> Apart from that a multi-edge annotation in RDF is mapped to multiple edges, all with the same annotations, in LPG. End of story.
> 
> > and make it possible to have implementations which provide interoperability in the data from LPG++ query languages and SPARQL++ query languages such as (hopefully) an RDF-star aware extension of SPARQL.
> 
> I think we have a SPARQL-star extension to SPARQL already and neither can I see nor did anyone mention any problems with multi-edge reifications and SPARQL-star. But I'm no expert in that field. Maybe you are more concerned about Gremlin and Cypher? Would there be issues?
> 
> Souri provided examples of how query uniformity is affected (and Peter showed how this is just a standard RDF problem). That is the kind of detailed explanation I'd like to see and discuss. Constraining the expressivity of a construct is serious business, and there are more dimensions and use cases to consider and of interest than just 1:1 RDF/LPG mapping. RDF-star was never, really never, introduced and  designed to _only_ bridge the gap between LPG and RDF. Of course it must do that, and it will, but demanding that it is constrained to do only that because otherwise it might irritate some newcomers, without giving any specific detail and justification is not convincing. The annotation syntax even puts the single-edge annotation use case front and center. How is that not enough?
> 
> Thomas
> 
> >
> >Bryan
> >
> >________________________________
> >From: Thomas Lörtsch <tl@rat.io>
> >Sent: Monday, April 15, 2024 9:44:42 AM
> >To: public-rdf-star-wg@w3.org; Thompson, Bryan; Gregg Kellogg; Lassila, Ora
> >Cc: Niklas Lindström; Kurt Cagle; James Anderson; RDF-star Working Group; Bebee, Brad; Schmidt, Michael; Hartig, Olaf; Williams, Gregory; Andy Seaborne
> >Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
> >
> >CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> >Am 15. April 2024 17:53:46 MESZ schrieb "Thompson, Bryan" <bryant@amazon.com>:
> >>The devil is in the details.
> >
> >Can you be more specific? I can't even say if you refer to implementation, or user confusion, or sensible interpretation, or something else entirely.
> >
> >Thomas
> >
> >> In fact, everything else can be dealt with.  Yes, RDF can express things that are outside of the core of LPG, but edge property handling is the cornerstone for interoperability.
> >>
> >>
> >>Note that some LPG models *do* support recursive properties about properties.  For example, some Gremlin implementations have "meta-properties" which do this.
> >>
> >>
> >>Bryan
> >>
> >>________________________________
> >>From: Gregg Kellogg <gregg@greggkellogg.net>
> >>Sent: Thursday, April 11, 2024 12:29:21 PM
> >>To: Lassila, Ora
> >>Cc: Niklas Lindström; Thompson, Bryan; Kurt Cagle; James Anderson; RDF-star Working Group; Bebee, Brad; Schmidt, Michael; Hartig, Olaf; Williams, Gregory; Andy Seaborne
> >>Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
> >>
> >>
> >>CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>On Apr 11, 2024, at 4:38 AM, Orange Lassila <ora@amazon.com> wrote:
> >>
> >>Niklas,
> >>
> >>What we are saying is that the “one reifier multiple triples” model will diverge RDF even more from LPGs, effectively preventing us from improving the alignment between the two graph models. This would be bad for the graph community at large. We are not saying what RDF-star is; this is a community process, and the WG will choose its own direction. What we are saying is the direction the current proposal suggests will harm the broader graph community at the expense of adding a feature the use cases of which could be satisfied using the simpler model as well.
> >>
> >>RDF-star already is a superset of LPG in the sense that annotations can have their own annotations, and annotations can be entity relationships in addition to scalar attributes. Allowing a many-to-many reifier doesn’t seem like it would be the straw that breaks the camels back. For me, they key is to have a way to represent LPG in RDF-star, which I believe we can do, not necessarily to allow any RDF-star graph (not to mention dataset) be represented as an LPG. RDF already has greater expresivity than LPGs, and this would be just one more way in which RDF can represent more relationships.
> >>
> >>It seems to me that the draw to LPGs as being “simpler” is not dissimilar to the draw of using JSON for representing information vs RDF. JSON carries no inherent meaning and does not facilitate interoperability between multiple models, which is similar to LPG vs RDF-star. The JSON-LD approach has not been universally appriciated, but its goal to “bring meaning to JSON” is effective, and something similar at the model level of using RDF-star to bring meaning to LPGs might be a similar paradigm.
> >>
> >>Gregg
> >>
> >>My original message was to show what we (the Amazon Neptune team) are thinking, and where we want to go. Better alignment between RDF and LPGs is something that is needed, based on our discussions over several years with literally hundreds of Neptune customers about how they want to use graphs (and why they might choose LPG over RDF).
> >>
> >>Soon, I will have worked on RDF for 27 years. I am deeply committed to making sure that RDF continues to succeed. My view is that this success comes, partly, from making sure the largest possible user community can (and will) find it easy and useful to use graphs.
> >>
> >>Ora
> >>
> >>P.S. As for “adding a restriction” to RDF (in the form of “well-formedness” as we suggest): this is nothing new, as such restrictions already exist in RDF.
> >>
> >>--
> >>Dr. Ora Lassila
> >>Principal Technologist, Amazon Neptune
> >>
> >>
> >>From: Niklas Lindström <lindstream@gmail.com<mailto:lindstream@gmail.com>>
> >>Date: Thursday, April 11, 2024 at 6:47 AM
> >>To: "Thompson, Bryan" <bryant@amazon.com<mailto:bryant@amazon.com>>
> >>Cc: Kurt Cagle <kurt.cagle@gmail.com<mailto:kurt.cagle@gmail.com>>, James Anderson <anderson.james.1955@gmail.com<mailto:anderson.james.1955@gmail.com>>, RDF-star Working Group <public-rdf-star-wg@w3.org<mailto:public-rdf-star-wg@w3.org>>, "Lassila, Ora" <ora@amazon.com<mailto:ora@amazon.com>>, "Bebee, Brad" <beebs@amazon.com<mailto:beebs@amazon.com>>, "Schmidt, Michael" <schmdtm@amazon.com<mailto:schmdtm@amazon.com>>, "Hartig, Olaf" <ohartig@amazon.com<mailto:ohartig@amazon.com>>, "Williams, Gregory" <ngregwil@amazon.com<mailto:ngregwil@amazon.com>>
> >>Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
> >>
> >>CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>Ora, Bryan,
> >>
> >>I interpret you as claiming that all reasonable uses of `rdf:reifies` reification should be restricted to *one statement*?
> >>
> >>As we know, multiple *triples* may denote the *same relationship* (following from entailment), and thus any such restriction must be on a semantic level *above* RDF. OWL already has such features; but it's unclear if you're suggesting we somehow add functional properties to the core of RDF?
> >>
> >>Are you arguing also that RDF-star is only for making statements about statements in the strictest sense, i.e. that a "reifier" (the subject of `rdf:reifies`) is an rdf:Statement (or a more precise notion of a statement occurrence or token); and nothing else?
> >>
> >>We already see other kinds of "reifiers" in many of our collected use cases (such as reified observations, relationships or situations), along with an assortment of examples of both RDF-star in blog posts and LPG example practices utilizing those same notions. Are many of these misguided or outright wrong, according to you?
> >>
> >>If so, I wonder if you still favor option 3 (unstarring to option 2), or something more aligned with option 1 [1]? Old-style reification in option 1 caters for very strict provenance information about startement tokens, but arguably not directly for things with properties such as flight distances, pipe sizes and schedules, or even durations. These cases only make sense if such properties are shorthand chains from the token to the object, bypassing the concretized situation or event from the domain of discourse.
> >>
> >>So if you claim that we should *only* cater for named statement tokens, I have a hard time seeing how the simplest design is to add triple terms and a new `rdf:reifies` property *and* then a new kind of builtin functional restriction (yet to be formalized) upon that property, to manage what would otherwise be deemed too complex (and/or hard to understand).
> >>
> >>On the other hand, if you *do* support other notions of reifiers than statement tokens; can you show how these more concrete resources are *all* functionally reifying only one statement? That seems to go against certain intuitions which have been considered important to cater for in RDF (e.g. see Enrico's examples about a symmetric marriage relation, and in general e.g. [2], [3], [4]). We've tried not only to avoid misuse, but to actually *make sense* of such practices. This sense does not exclude an OWL based restriction if you actually do model "edges themselves", such as:
> >>
> >>    ex:Edge a owl:Class ;
> >>        rdfs:subClass [ a owl:Restriction;
> >>                        owl:onProperty rdf:reifies ;
> >>                        owl:cardinality 1 ] .
> >>
> >>But that does not prohibit many-to-many in general of course, which is what you're arguing against.
> >>
> >>Best regards,
> >>Niklas
> >>
> >>[1]: <https://htmlpreview.github.io/?https://github.com/w3c/rdf-star-wg/blob/main/docs/seeking-consensus-2024-01.html>
> >>[2]: <https://en.wikipedia.org/wiki/Resource_Description_Framework#Statement_reification_and_context>
> >>[3]: <https://www.researchgate.net/publication/325995356_Reification_and_Truthmaking_Patterns>
> >>[4]: <https://github.com/w3c/rdf-ucr/wiki/RDF-Star-for-Talking-About-Multiple-Triples-at-Once>
> >>
> >>
> >>
> >>
> >>On Wed, Apr 10, 2024 at 1:14 AM Thompson, Bryan <bryant@amazon.com<mailto:bryant@amazon.com>> wrote:
> >>So, I think there is a difference.  While the reified statement model expressions are, I think, best seen as an interchange syntax for reification rather than as a means to store and process statements about statements, I am *not* just talking about the syntax.
> >>
> >>I am suggesting that we SHOULD NOT carry forward the possibility that a graph can be well-formed if it has the same identifier for two statement models for different S, P, and Os.
> >>
> >>If we label that as "not-wellformed" then we are unblocking databases and SPARQL query processing systems from an efficient mechanism for representing and processing statements about statements.
> >>
> >>I see this as an essential step.  Clarity around this was the initial driver for RDR and RDF*.  We are now posed to either deliver on this or...well...do hyperedges with implicit syntax?  Which we can do with explicit modeling just fine.
> >>
> >>RDF has in many ways stood the test of time, but it has consistently fallen down on this specific feature -- the ability to have statements about statements in a clear, effective, and efficient solutions.  In my mind, this is the central unaddressed issue of the journey from 1999 today -- how the semantic web can deal with statements about statements opening the door to handling a variety of use cases, including edge properties, provenance, modeling of uncertain information, and even use cases such as hyperedges built on explicit labeled models grouping the sources and targets of the edge.
> >>
> >>Bryan
> >>________________________________
> >>From: Kurt Cagle <kurt.cagle@gmail.com<mailto:kurt.cagle@gmail.com>>
> >>Sent: Tuesday, April 9, 2024 2:58:51 PM
> >>To: Thompson, Bryan
> >>Cc: James Anderson; RDF-star Working Group; Lassila, Ora; Bebee, Brad; Schmidt, Michael; Hartig, Olaf; Williams, Gregory
> >>Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
> >>
> >>CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>Bryan,
> >>
> >>> This has not worked for 25 years.  The capability has been there all along.  But it does not convince people and I know of no RDF/SPARQL database which makes this efficient.
> >>
> >>Let's put things into perspective:
> >>
> >>Yes, the RDF 1.0 spec has been around for twenty-five years. Turtle has been around for about thirteen years, and technically only for eleven if you're counting from the spec. SPARQL has been around for 17 years, but it had a MAJOR upgrade in 2014, including the introduction of SPARQL UPDATE, Graphs, and Services, and most of these changes only really saw implementation within the last five to six years. SHACL was 2017, and is only JUST beginning to become pervasive in the sector.
> >>
> >>We are all people who have been involved with RDF for 20+ years, and so we tend to know its warts intimately, but for a great number of people and organizations, it's only JUST beginning to become a thing (I sat in on a webinar for a GIS based knowledge graph system an hour ago).
> >>
> >>Adoption has been slow primarily because there really was no NEED for knowledge graphs until comparatively recently - we were all very much ahead of the curve in that regard, and it's been painful. Trust me on this - I have watched XML from its inception, and it has gone through its XML winter, and actually there seems to be a bit of a revival going on in that space as the limitations of JSON (and the prevalence of script kiddies who wanted to see everything as being like Javascript objects) have given way to Pythonistas for whom XML was just another format (with its own libraries that predated JSON and that are still going strong today).
> >>
> >>RDF is, for all intents and purposes, frozen. We can define new predicates and classes into that spec, but honestly, we'd be better off putting those into their own namespace and calling them something else because we are not really going to be able to modify rdf by itself given a fairly broad install base. The XML community tried making base changes to the underlying SGML syntax, and came away with the realization that it was not feasible because of that legacy problem, I don't see RDF being any different.
> >>
> >>Now, what we are talking about is syntactical changes to Turtle. Turtle is an RDF expression, but it is an RDF expression that is important because a Turtle2 could've rolled out without anywhere near as much impact, so long as you changed the namespace. This happened with named graphs and TRIG. Named graphs DID require changes to RDF, because it required a fourth term in the default ntuple.
> >>
> >>With reification, on the other hand, we don't need to change anything in RDF precisely because we have a mechanism for decomposing a triple into its component parts that goes back 25 years. What we're ultimately talking about is Turtle syntax, which is an RDF representation but is not the same thing as RDF. What we're arguing about (and have been for a while) is whether or not the IRI :r is unique. This is a Turtle problem and the consensus that seems to be emerging is that Turtle can define a default IRI that is used in the absence of one being presented.
> >>
> >>Personally, I think there is a lot of room to extend Turtle. Turtle is nothing but syntactical shortcuts and always has been. If you create a Turtle-star, then you can create those syntactical shortcuts for reification, and can even represent hypergraphs as intermediate data structures. That's the point I was trying to get at. You roll it out under a different iteration, possibly with a ttl2 extension, incorporate trig changes, and pass it to the developers for the next iteration of these tools. If history is any guide, we're talking about seeing implementations even for what we are ostensibly calling RDF-Star likely making their way into platforms like Jena within a year, and to commercial products within three to four.
> >>
> >>My argument is that if you're going to push for changes in the Turtle interpreters, then it's best to identify ALL of the changes that make sense at this point in time, even if it involves rechartering the working group.
> >>
> >>Now, I'm the new kid on the block as far as this group goes and I'm a non-voting member, so I don't know if this will make any difference whatsoever, but I've fought these battles in the XML world for a long time (about 25 years, come to think of it). I don't see them being that much different in the RDF world. While we're at it, we can look at the LPG world for guidance, and make suggestions for SPARQL 2 operators, consistent extension mechanisms, and so forth.
> >>
> >>One final point, while I'm on my soapbox. I spend a lot of time in the LLM space right now. What I see is that by the time we actually come to a consensus on Turtle reification syntax, we are MUCH more likely to see the vast bulk of SPARQL queries being written by LLMs, not human beings. I'm not a huge fan of much of the AI hype, but I do generation of SPARQL from LLMs  daily NOW, and so far it's been pretty good (even given the comparative infancy of codegen) once you feed it SHACL schemas. The same holds true for OpenCypher, FWIW.
> >>
> >>Okay, stepping off my soapbox.
> >>
> >>
> >>
> >>
> >>
> >>Kurt Cagle
> >>Editor in Chief
> >>The Cagle Report
> >>kurt.cagle@gmail.com<mailto:kurt.cagle@gmail.com>
> >>443-837-8725
> >>
> >>
> >>On Tue, Apr 9, 2024 at 2:02 PM Thompson, Bryan <bryant@amazon.com<mailto:bryant@amazon.com>> wrote:
> >>This has not worked for 25 years.  The capability has been there all along.  But it does not convince people and I know of no RDF/SPARQL database which makes this efficient.
> >>
> >>>  You can have interoperability with LPGs right now, regardless, it's just very verbose when expressed in RDF 1.1. My argument is that we're talking about changes to Turtle. Just as we use (:a :b :c) as a shortcut for an RDF linked list expansion, the changes that we are making here are Turtle syntax related.
> >>
> >>Bryan
> >>
> >>
> >>________________________________
> >>From: Kurt Cagle <kurt.cagle@gmail.com<mailto:kurt.cagle@gmail.com>>
> >>Sent: Tuesday, April 9, 2024 1:51:32 PM
> >>To: Thompson, Bryan
> >>Cc: James Anderson; RDF-star Working Group; Lassila, Ora; Bebee, Brad; Schmidt, Michael; Hartig, Olaf; Williams, Gregory
> >>Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
> >>
> >>CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>You can have interoperability with LPGs right now, regardless, it's just very verbose when expressed in RDF 1.1. My argument is that we're talking about changes to Turtle. Just as we use (:a :b :c) as a shortcut for an RDF linked list expansion, the changes that we are making here are Turtle syntax related.
> >>
> >>Note: There have been any number of graph databases that failed, including HypergraphDB, both because of difficulty in use and in the fact that there was simply not enough need to justify them when they were released.  The fact that we are bumping against more and more use cases where the need does exist suggests rather that the market is finally recognizing the need.
> >>
> >>
> >>Kurt Cagle
> >>Editor in Chief
> >>The Cagle Report
> >>kurt.cagle@gmail.com<mailto:kurt.cagle@gmail.com>
> >>443-837-8725
> >>
> >>
> >>On Tue, Apr 9, 2024 at 1:35 PM Thompson, Bryan <bryant@amazon.com<mailto:bryant@amazon.com>> wrote:
> >>The issue with the example below is that it requires an explicit statement model.  This violates the "effective and efficient" criteria.  The main and original outcome of the discussions with Olaf was that RDF reification could be seen as an interchange syntax and the database could have a license to model the information differently.
> >>
> >>However, people avoid explicit RDF reification like the plague precisely because it leads them into the territory with wondering about "what does it mean" and "do I really need to use 5 statements to make one statement".  5x the data.  5x slower.  etc.  This is a very real perception among potential consumers of RDF.
> >>
> >>Hypergraphs are an attractive nuisance in this regard.  Talk about scope creep for the charter :-).
> >>
> >>I have nothing against hypergraphs.  My proposal would be to model it all quite explicitly as illustrated in (F).  (F) is clearly a hyperedge.  If you do this, you can easily model hypergraphs as well as typed role-based connections and we can perhaps create another community group to explore that.  If we can get some transaction around use cases for hypergraphs in a community group, that could certainly drive standardization.  I will note in passing that HyperGraphDB is dead.  It had some traction for a while, but clearly failed to secure sufficient interest to drive other platforms to modeling hypergraphs.
> >>
> >>In contrast to "hypergraphs now", we have a very clear utility to RDF/LPG interoperability now.  If we make the right decision now, we can extend that to bring the benefits of RDF to LPG users and perhaps the rising tide of LPG uses to RDF.
> >>
> >>> (F) [from the original letter]
> >>> :s1 :p :o {| :b1 | :ep 1 |}
> >>> :s2 :p :o {| :b2 | :ep 2 |}
> >>> :b1 :partOf :b
> >>> :b2 :partOf :b
> >>
> >>Bryan
> >>
> >>________________________________
> >>From: Kurt Cagle <kurt.cagle@gmail.com<mailto:kurt.cagle@gmail.com>>
> >>Sent: Tuesday, April 9, 2024 1:21:32 PM
> >>To: Thompson, Bryan
> >>Cc: James Anderson; RDF-star Working Group; Lassila, Ora; Bebee, Brad; Schmidt, Michael; Hartig, Olaf; Williams, Gregory
> >>Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
> >>
> >>CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>Making an observation here:
> >>
> >>If you have an :S :p :O model where :S and :O are (hypothetical) sets (of anything, including graphs), this can be expressed as :s rdf:member :S, :o rdf:member :O , which is the definition of a hypergraph.  (I discuss this at length in https://www.linkedin.com/pulse/rethinking-hypergraphs-kurt-cagle-n2oec/).
> >>
> >>RDF by itself does not support hypergraphs, but RDF with the interpretation that :S and :O are pointers to sets ( in various ways) certainly does support hypergraphs.
> >>
> >>This extends beyond reification; it's just that the reification operator has introduced hypergraphs.
> >>
> >>I'd also argue that we're actually not talking about RDF here (star or otherwise) but Turtle. RDF doesn't care - it can represent both kinds of structures just fine, so long as you allow for the notion that you're going to be dealing with pointers to sets.
> >>
> >>I'd also argue that the RDF vs LPG argument is a bit of a red herring. You can represent an LPG (and let's be honest, call it Neo4J) readily in RDF without the syntact sugar:
> >>
> >>(Basic Turtle here)
> >>:Joe :married :Alice .
> >>:r1 rdf:subject :Joe .
> >>:r1 rdf:predicate :married .
> >>:r1 rdf:object :Alice .
> >>:r1 :dateStart "2014-06-04"^^xsd:date .
> >>:r1 :dateEnd "2020-12-01"^^xsd:date .
> >>:r1 a MaritalRelationship: .
> >>
> >>:Joe :married :Jane .
> >>:r2 rdf:subject :Joe .
> >>:r2 rdf:predicate :married .
> >>:r2 rdf:object :Jane .
> >>:r2 :dateStart "2021-03-02"^^xsd:date .
> >>:r2 :dataEnd "2024-04-01"^^xsd:date .
> >>:r2 a MaritalRelationship: .
> >>
> >>
> >>That doesn't change regardless of the syntactical sugar. If I want to define a hypergraph, this could be done as:
> >>:r1 rdf:member :R .
> >>:r2 rdf:member :R .
> >>:R owl:sameAs :JoeMarriages .
> >>
> >>SPARQL then sees this as:
> >>?r rdf:member :R.
> >>?r rdf:subject ?rs.
> >>
> >>I think the biggest problem is that we have no native set operator in Turtle. Let's say we have an operator like [[ ]] which represents a set, then:
> >>
> >>[[<<:Joe :married :Alice>> | :dateStart "2014-06-04"; dateEnd "2020-12-01" |
> >><<:Joe :married :Jane>> | :dateStart "2021-03-02"; dateEnd "2024-04-01"| ]]  :listOf :JoesMarriages .
> >>
> >>becomes feasible.
> >>
> >>I agree with Peter that LPGs should be backwards compatible with Turtle (you can express an LPG in Turtle), but that Turtle should not be limited by where LPGs are.
> >>
> >>Kurt Cagle
> >>Editor in Chief
> >>The Cagle Report
> >>kurt.cagle@gmail.com<mailto:kurt.cagle@gmail.com>
> >>443-837-8725
> >>
> >>
> >>On Tue, Apr 9, 2024 at 11:03 AM Thompson, Bryan <bryant@amazon.com<mailto:bryant@amazon.com>> wrote:
> >>
> >>One would do this because the future relevance of RDF is at stake.  It would be an extreme disservice to RDF to introduce a more conceptually complicated model of RDF Reification, and implicit grouping via the same identifier is a more conceptually complicated model.  I have been involved in this via RDF-star since 2012 when I got Olaf interested in this problem and via "Reification Done Right" since 2008 and via other activities back to 1999 with a critique of the semantic web as being unable to handle uncertain and messy data, which is what we have in the real world.  To my thinking, the conceptual difficulties of RDF reification have been a major reason why LPG had an opportunity in the graph standards market when we had solid detailed existing standards.  LPG makes edge properties simple.  And edge properties are a critical -- the number one critical -- use case for RDF Reification.  There are to be certain other valuable use cases, but this is frankly table stakes for graph standards.  RDF has a *lot* of other benefits, but it falls down on the handling of edge properties.
> >>
> >>To me, this is a question of basic relevance of RDF to the future.  Getting this wrong will slam the door closed on RDF.  Getting it right will make it possible to breath continued and new life into RDF.
> >>
> >>The issue for uptake and use by the broad graph community is not about having the "more capable model".  What RDF needs is a model which provides a clear, effective and efficient semantics for edge properties and ... extending that ... for statements about statements.
> >>
> >>Bryan
> >>________________________________
> >>From: James Anderson <anderson.james.1955@gmail.com<mailto:anderson.james.1955@gmail.com>>
> >>Sent: Monday, April 8, 2024 4:49:05 PM
> >>To: RDF-star Working Group
> >>Cc: Lassila, Ora; Thompson, Bryan; Bebee, Brad; Schmidt, Michael; Hartig, Olaf; Williams, Gregory
> >>Subject: RE: [EXTERNAL] A few thoughts on RDF-star, Reification, and Labeled Property Graphs
> >>
> >>CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>
> >>good morning;
> >>
> >>> On 8. Apr 2024, at 23:42, Lassila, Ora <ora@amazon.com<mailto:ora@amazon.com>> wrote:
> >>>
> >>> The Amazon Neptune team is committed to lowering the barriers to the adoption of graph databases and graph-based computing. Our customers benefit when we reduce the conceptual and technological gap between RDF graphs and Labeled Property Graphs (LPGs). Over the last several years we have seen LPGs increase their popularity thanks to easy-to-understand and easy-to-use features, even when RDF offers more sophisticated features such as (for example) easy graph merging, federated queries, and expressive schema languages. The importance and relevance of interoperability between RDF and LPG was established several years ago at the W3C workshop on Web Standardization for Graph Data (Creating Bridges: RDF, Property Graph and SQL) [1]. While its origins are much older, the RDF-star Community Group was established in the wake of this event. We believe that improving the ability for RDF and LPG graphs to interoperate will benefit the entire graph community.
> >>>
> >>> As we see it, the most critical outcomes of the work of the RDF-star working group should include:
> >>>     • Efficient RDF support for “edge properties”, including the ability to have different property sets for otherwise identical edges (LPGs do not have the restriction RDF has where triples are unique in a graph).
> >>>     • Simple and clear RDF support for statements about statements (supporting provenance mechanisms and other identified use cases).
> >>>     • Laying the groundwork for interoperability “in the data” between RDF and LPG languages (e.g., a single database that can expose both LPG and RDF based query languages over the same data).
> >>
> >>this third outcome, while valuable, is not one of the chartered tasks.
> >>is it the intent of this note to suggest that the charter should be extended?
> >>
> >>>  The alignment of features and capabilities between RDF and LPGs is possible if there are no fundamental incompatibilities between the two graph models. The RDF-star Working Group’s original goal, an easy mechanism for making “statements about statements”, would make the gap between the two models significantly smaller; statements about statements are a feature similar to “edge properties” in LPGs, the lack of which in RDF we often hear cited as the reason users choose LPGs. On the other hand, the current proposal the WG is entertaining, the “single reifier multiple triples” model, has no clear counterpart in LPGs, renders the two graph models even more different than they are today, adds significant complexity (there are more expressive alternatives with simpler semantics), and makes it even more difficult to understand RDF reification rather than offering a conceptually simple framework.
> >>>
> >>> Limiting reifiers to single statements – and classifying scenarios with a single reifier for multiple statements as “non-well formed” – will bring the greatest benefit to the graph community at large. On the other hand, allowing a single reifier for multiple statements will make it very difficult to align the LPG and RDF models. Please see the examples below.
> >>
> >>this suggests to restrict the more capable model to conform with the limitations of the less capable model, not as a matter of usage or a conventional profile, but as a required characteristic.
> >>why would one do this?
> >>
> >>this discussion conflates two aspects of the model:
> >>- the cardinality of the identified statements
> >>- the cardinality of the annotations on the identified entity
> >>
> >>it should be possible to consider them independently.
> >>
> >>there is nothing in the examples or commentary below which substantiates any argument beyond that a profile would be expeditious.
> >>
> >>>
> >>> We strongly believe that the continued relevance of RDF depends on establishing interoperability with LPGs. As stated above, RDF brings some tremendous advantages, and we are committed to bringing these advantages to the community of LPG users as well. We believe that this reflects the spirit of the W3C workshop on Web Standardization for Graph Data and resonates with inputs from some other members of the working group.
> >>>
> >>> [1] https://www.w3.org/Data/events/data-ws-2019/
> >>>  Examples:
> >>>
> >>> # (A) An LPG edge with a single edge property.
> >>> (s) - [p {ep: 1}] → (o)
> >>>
> >>> # (B) An interpretation of that in an SPOI model (where I is a statement identifier).
> >>> # The OneGraph model is based on such SPOI tuples.
> >>> s p o :sid1
> >>> :sid1 ep 1 :sid2
> >>>
> >>> # (C) An RDF-star expression consisting of an asserted triple and a statement about
> >>> # that.
> >>> :s :p :o {| :ep 1 |} # with an anonymous identifier for the (s p o) statement.
> >>>
> >>> # (D) The RDF interpretation of that RDF-star expression.
> >>> :s :p :o . # The asserted triple.
> >>> _:b rdf:reifies <<( :s :p :o )>> . # A reifier for that triple.
> >>> _:b :ep 1 . # Using that reifier to make an assertion about a triple occurrence.
> >>>
> >>> # Note that the LPG example (A), the SPOI interpretation (B), and the RDF model (D)
> >>> # can be handled as exactly the same data within possible database implementations
> >>> # such as proposed by Souri or by a OneGraph implementation.  The case where _:b is
> >>> # replaced by an IRI can also be handled under LPG, 1G, etc.
> >>>
> >>> # Now, let us look at the case where different statements are assigned the same
> >>> # reifier:
> >>>
> >>> # (E) Same reifier used in two expressions about different triples.
> >>> :s1 :p :o {| :b | :ep 1 |} # a statement about a statement with reifier ":b".
> >>> :s2 :p :o {| :b | :ep 2 |} # a statement about a different statement, same reifier.
> >>>
> >>> # This last case (E) has no sensible interpretation under LPG.
> >>>
> >>> # If we accept a constraint that using the same reifiers for different TripleTerms
> >>> # is not well-formed, then we can maintain a consistent interpretation with LPG edge
> >>> # properties.  Further, we can use explicit modelling to group statements and retain
> >>> # transparency about the functional or semantic roles in such groupings.
> >>>
> >>> # (F) Two statements are being grouped by an explicit semantic relationship (:partOf).
> >>> :s1 :p :o {| :b1 | :ep 1 |}
> >>> :s2 :p :o {| :b2 | :ep 2 |}
> >>> :b1 :partOf :b
> >>> :b2 :partOf :b
> >>>
> >>> # We submit that this explicit modelling is more useful and preserves the alignment
> >>> # with LPG and RDF which has such great value to the world wide graph community.
> >>>   --
> >>> Dr. Ora Lassila
> >>> Principal Technologist, Amazon Neptune
> >>
> >>
> >>---
> >>james anderson | james@dydra.com<mailto:james@dydra.com> | https://dydra.com<https://dydra.com/>
> >>

Received on Friday, 19 April 2024 11:55:41 UTC