W3C home > Mailing lists > Public > public-rdf-star@w3.org > December 2020

Towards RDF* semantics focused on triple instances

From: thomas lörtsch <tl@rat.io>
Date: Fri, 4 Dec 2020 00:11:50 +0100
Message-Id: <F1153CBF-E264-458D-ADA2-9807D9488144@rat.io>
To: public-rdf-star@w3.org
I advocate to design an RDF* semantics that foremost captures the following usecase: 

	provenance annotation to a triple, 
	triple and annotation both in the same graph. 

This is the usecase that everybody hopes and expects RDF* to solve. Everything else is icing on top of it. If there’s one good thing about RDF* it’s that it looks like it was simple and straightforward. And the above example is the most faithful account of that look. In the terminology that I propose below this is a triple instance.

We probably have more different possible interpretations of what a triple is than we would like to know - and we should have been warned [0]. I’ll first try to pin down a terminology to make discussing them easier as current naming conventions are a desaster and I’ve probbaly been one of the worst offenders in recent discussions.

1: triples as mathematical abstractions, e.g. the type of triples with subject :s, predicate :p and object :o. In accordance with the semantics proposal on the wiki I’ll call those
	'triple intension'
or short 'intension'. 
Other popular names are "triple", "triple type", "type", "abstract triple" - but some of those names get used also for other categories

2: _all_ occurrences of a certain triple. Analogous to above I’ll call those together the
	’triple extension’
or short 'extension'. 
Other popular names are "triple", "triple type", "abstract triple", "triple occurrences" - but some of those names get used also for other categories

3: _one_ occurrence of a certain triple. I’ll call it 
	’triple instance'
or short 'instance'. 
Other popular names are "triple", "triple occurrence", "triple token" - but some of those names get used also for other categories.

Orthogonal to the disambiguation between triple intension, extension and instance is the question if we mean
A: the thing itself as a syntactic construct or 
B: its interpretation, maybe subsuming co-references and entailed assertions or 
C: something in between, to the likes of Pierre-Antoines and Dörthes proposal for a referentially opaque semantics of incomplete reifications in SA mode.
There are overlaps esecially as technically it is appealing to use the syntactic layer of model theory to encode anything not in the interpretation domain, but my hunch is that every combination is reasonably possible, e.g. talking about the triple intension and all its possible co-references in the interpretation domain. But we don’t need to go that far, I hope.

RDF standard reification has a clear position on what it wants to achieve: reification of the interpretation of triple instances, variant 3B. It "just" lacks one crucial instrument: it can’t point to the precise instance it is describing. This makes it practically quite useless. Consequently and unsurprisingly its syntax is used with other, non-standard semantics like "unasserted assertions" - variant 3A - or misunderstood as referring to an instance in the same graph as the reification quadlet occurs in.

RDF* could do something quite similar: it could refer to a triple instance, but without the syntactic verbosity of RDF standard reification. It could, especially in PG mode, naturally default to annotating the instance in the same graph, which would solve the core problem of RDF standard reification. In that minimal form it wouldn’t be able to annotate triple instances in other graphs, but that's a feature not often requested, so not a problem for a streamlined and pragmatic approach like RDF*. 
This is in fact how RDF* is advertised and widely understood and what its examples convey. However I have to say "could" because the current proposal for a semantics in the wiki goes another route: it defines the embedded triple as representing the triple extension, variant 2B. This is very strange and it will only cause confusion and dissatisfaction among users. It also runs contrary to the interest of the supposed subjects of this community group, the implementors, because this is not not what they and their users expect - and I hope they will realize that in teh not too distant future and speak up a bit. And what’s worse, this is not immediatly obvious. The semantics in the wiki in this respect is a ghost driver and a desaster waiting to happen. It will, someday but inevitably, collide with an approach that does indeed what it says. It will become the IE6 of the semantic web that everbody hates and circumvents with dubious quirks. This is not what anybody wants.

Defining a semantics that targets triple instances is probably not easy already as it requires to differentiate triple instances in different graphs, something that the RDF semantics of standard reification left open. I still do not properly understand why this was left open but I was told (IIUC) that it has to do with tricky set-theoretic issues. The RDF model theoretic semantics mentions (different, distinct) graphs a lot but IIUC it formalizes only THE set of all triples, but neither graphs nor ALL sets of triples. But I trust that in this CG are the people that can solve this problem, if it’s solvable. 

Everything else is surplus value and not essential, but some surplus features are more pressing than others and some are easier to realize than others. 

The most pressing issue is probably defining the meaning of an embedded triple which is not actually asserted in the same graph. This issue is specific to SA mode but since SA mode is the more triple-centric syntax than PG mode this issue should be solved.
There are usecases to encode "un-asserted" assertions, "un-endorsed" assertions, N3 formula-like referentially opaque assertions, and other variations. This is a subject that is not only tricky to formalize but also tricky to formulate. Unfortunately there is a big overlap: some formalizations seem elegant but don’t capture some intended semantics, and vice versa, and possible semantics have subtle differences. I’d expect more discussions here, not only on referential opacity/transparence. 
Maybe this is best understood and abstracted as a differentiation between *active* and *passive* statements - statements that one wants to be part of the interpretation (active, and this is the normal triple as we know it) and statements that one wants, for whatever reason, to be kept outside, but "at arms length". The current proposal for a semantics does a nice job in that respect: "passive" triples share blank nodes with active triples, they do only show up in query results if expressly included and they do only participate in entailments if expressly included. That’s what I mean by "keeping them at arms length". If one would just want to archive them and only access them by their properties, a simple (typed) literal would be enough - but this here achieves more, it keeps those triples around in a parsed state, ready to use, but constrained, and I think that arrangement has some real value.
So the abstract categries would be 'active' and 'passive', the formalization would probably use the distinction between syntax and interpretation that model theory builds on. This seems a reasonble approach. I like Peter’s idea to coin specific terms like :subject* and :object*. I wouldn’t mind if there was also a class :Statement* or something to that effect.
This issue can also be ignored if no consensus is possible, leaving the meaning of "uncomplete" annotations in SA mode undefined. I would be sorry however as I think N3 formulas are a very handy tool to manage statements that I don’t fully endorse, that I want to derive entailments from in a controllable way, that I want to keep around in a parsed and actionable form, but only show up in my results at explicit request. "Virtual triples" could as well be a fitting name.

Then there are two other features that strike me as low hanging fruit: annotating multiple statements at once and annotating a statement in a graph different from the local one. Once the above work is done I expect that these are merely syntactic issues. Of course history is rich with such expectations being bitterly disappointed - "what could possibly go wrong!" -, but I would advocate for always keeping an eye on these two features and checking proposed solutions if they also support and facilitate them or if they rather rule them out. 

Then there is what the current semantics proposal on the wiki encodes: the triple extension, all occurrences of a given triple type, variant 2B. Actually I do see the use but not much demand for it. I’ve been dabbling with syntactic variations like this:
	<< :a :b :c >>		// a triple instance in the local graph
	<< :a :b :c <> >>	// a triple instance in the local graph (explicitly)
	<< :a :b :c :g >>	// a triple instance in some other graph :g
	<< :a :b :c {} >>	// a triple extension
	<< :a :b :c () >>	// a triple intension
which would allow to express such semantics. This might be easy to implement but the added expressivity is probably not worth the risk to muddy the waters of a streamlined version of RDF* with such subtle distinctions.

I would try to keep a safe distance to all questions that arise in the interpretation domain - co-references, entailments and the like. I see that the obvious, simple, low-hanging entailments and co-references are one core value proposition of the semantic web. But time and again they seem obvious, simple, low-hanging, but then they aren’t and they blow up in your face with subtle semantic variations that nobody can comprehend. In this respect I’d argue that we should treat embedded triples just like we treat IRIs: we don’t look much inside of them, we honor owl:sameAs relations if we have the time, we don't get religious about slight variations in meaning etc. Everthing else lies in the hands of applications and the "passive" variant of SA mode gives them a nice new tool to control semantic variations. 

I think I made it clear already but I’ll say it again nonetheless: IMO the last thing the RDF ecosystem needs is an RDF* semantics that seems to be about class instances but actually is about class extensions. Everybody is much better off with no semantics at all rather than such a ghost driver. And, because Pierre-Antoine seems to love weasel arguments, I’ll add: clarifying that they RDF* is indeed about triple extensions wouldn’t help much either. Nobody wants that.


[0] https://www.w3.org/TR/rdf11-datasets/
Received on Thursday, 3 December 2020 23:12:11 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 3 December 2020 23:12:11 UTC