Introduction & motivation from Thomas Lörtsch on 2022-11-17 (public-rdf-star-wg@w3.org from November 2022)

From: Thomas Lörtsch <tl@rat.io>
Date: Thu, 17 Nov 2022 20:38:11 +0100
To: public-rdf-star-wg@w3.org
Message-Id: <54305C15-93EC-45CA-9CCA-5C9BC9816451@rat.io>

Hi all,

the WG had its kickoff meeting, and its first meeting today, and I read the meeting notes with great interest. As my application as invited expert has not been accepted I can only follow from the side table, but I will. So I figured it would be good to introduce myself to those that don’t know me already from the Community Group.

Myself: I’ve been engaged in Semantic Web stuff for decades - first Topic Maps, then RDF - and I was for a long time arguing for quintuples as the solution to all things meta-modelling. I’m just now finishing my Master Thesis in Applied CS, titled "Between Facts and Knowledge - Issues of Representation on the Semantic Web". I hold a diploma in Architecture from one of my former lifes.

My reason to participate: while I see a lot of problems with RDF-star - model, syntax and semantics - my most urgent worry by far is the proposed semantics. Referential opacity of quoted triples is inadequate for the vast majority of use cases, is widely ignored already and stands no chance to succeed "in the wild". It most definitely should not become standardized as proposed by the CG. But there are other ways to make everyone happy.

A bit more in the postscriptum. I already wrote too much for an introduction. I could write even more... unfortunately right now I have a deadline approaching, so until December this will mostly be it.

Regarding the discussion you had today: one has to differentiate between the statement itself - which of course is always the same - and its occurrences as
a) the act of stating it - an instance
b) a qualification of it - a subtype
Whatever you do syntax-wise, you’ll always arrive at this problem. And it’s even more intricate: if an occurrence is understood as an instance or as a subtype depends on the application context, you can’t know it beforehand. To one application provenance is just administrative dust, to another one it’s essential detail: that changes how one looks at it and if one interprets it as instance or subtype.
I think it’s useful to look at Pat’s reification vocabulary: it always creates an identifier and it never can talk about the statement itself, only about occurrences. That avoids all the problems, but never satisfied anybody. I find Souri’s approach quite clever because it also always creates an identifier, and more of them if needed, so you’ll never run into problems with disambiguating multi-part annotations from multiple parties. But unlike RDF Standard Reification it has the ability to annotate statements more directly: it can say "this statement here has been made by X" or "this statement here is unreliable". Depending on understanding this opens the door to paradoxes, but if you understand the annotated statement as being part of the extension of the statement type, or a subtype, then I think it can work. And I think this way it can be made to work with RDF semantics.

Also, a request: could you please have discussions as much as possible on this list, the public one? I don’t know if that is customary anyway, I’d just like to reinforce the point. Otherwise it’s quite hard to participate meaningfully from the outside.

Best,
Thomas

P.S.: Don’t take this as a rejection of referential opacity in principle. IMO referentially opaque RDF constructs could be very useful if applied at the right place and in the right way. They have been proposed several times already, including for N3 formulas and for Named Graphs (Carroll et al 2005). However, those proposals failed to garner wide-spread adoption because the specific need that this semantics covers is just too special to survive the onslaught of practice if it is the only semantics provided for a construct central to much more pressing needs: that was true for N3 nested graphs (didn’t make it into Turtle, N3 stayed niche despite its obvious elegance) as well as for Named Graphs (the semantics was just ignored in SPARQL, in implememtations, and also later in the RDF 1.1 WG no matter how hard Pat fought for it). Referentially opaque quoted triples will suffer the same fate, in fact they do already.

Quoted triples will primarily be used as a hinge between referentially transparent statements and their referentially transparent annotations. That is true for qualifiying annotations - which are a contentious topic but will be a fact in practice - but also for provenance annotations and the like. Even for most provenance annotations it makes no sense to restrict them to the exact syntactic form: the Semantic Web is referentially transparent, it is about the meaning of referents and statements, and also provenance annotations predominantly record provenance of what those statements mean, not about the exact wording. In all those use cases the quoted triples can’t be referentially opaque themselves: it just makes no sense. That is true for almost all use cases collected by the CG.

The CG proposal suggests that in such cases referential transparency can be achieved through Transparency Enabling Properties (TEP). But to require ordinary semwebbers to make an additional effort to disambiguate referentially transparent annotations from a referentially opaque standard will be ignored in practice: the difference is just too subtle to convince anybody to make the extra effort (that is valid for *any* such mechanism, no matter how simple, but it's especially true for the proposed TEP mechanism which is just ridiculously involved). A sound design would tackle the problem the other way round: the special need has to require the extra step.

This is a classic case of separation of concerns: don’t mix stuff that isn’t related just because you think that you can’t afford one more primitive. Any attempt to overload a construct that is desperately needed - in the past a grouping device like graphs in a formalism that lacks any other structuring means, now an annotation device when annotation is poised to become a mainstream modelling approach - with an orthogonal demand like explainable AI is bound to fail. It hasn’t worked in the past and it won’t work this time either. Quite to the contrary, standardizing the semantics proposed by the CG, a semantics that not only is largely disfunctional but has already proven to be ignored in practice, has the potential to cost us dearly - both in reputation and in application reliability.

Alternatives

The CG has already been reminded of an old idea, an alternative approach to referentially opaque snippets of RDF data: RDF literals (queryable, of course). Easy to implement, intuitive to understand, not prone to misuse.

RDF literals would also solve another problem of the proposed semantics: so far I’ve only seen two applications of quoted triples that obey by the proposed semantics (one of them by Pierre-Antoine Champin in an example that motivated the proposed semantics in his Lotico-talk, the other some ESWC 22 poster implementing signed URIs) and both applications needed to hackishly implement quoted graphs as lists of quoted triples to make their case. RDF literals could offer support for graphs without effort.

By the way: that feature of the proposed semantics, that blank nodes in quoted triples are still referentially transparent … it’s not that useful. Leaving aside the popular distate towards blank nodes, which I don’t share, a blank node is defined by all statements in which it occurs. Quoting one of those statements with syntactic fidelity and leaving unmentioned all the others, that contribute to the meaning of the blank node, is not a very useful design. If the whole Concise Bounded Description around the bnode was documented in an RDF literal and a referentially transparent quoted statement referred to that literal for syntactic fidelity, much more could be achieved, with much less fuss.

The minimally acceptable solution that I can imagine is that quoted triples remain referentially opaque, but a simple property allows to define an identifier for a referentially transparent version of it (semantically close to the identifier of an RDF standard reification quadlet). The shortcut syntax and a to be defined triple identifier syntax (as a forth element in Turtle, and equivalent to the venerable RDF/XML 'id' attribute) would then expand to that referentially transparent version.
Other combinations are possible: RDF literals (queryable, for single statements and graphs) plus statement identifiers (RDFn has valuable thoughts on them, many stores implement them already, even use them under the hood to implement RDF-star) and triples encoded as URNs (as Holger Knublauch pondered them in one of his posts) instead of quoted triples (that way getting rid of the need to introduce a new primitive into the RDF model). Having proper identifiers would also enable use cases that suffer from the quoted triple's verbosity, like e.g. deeply nested chains of provenance.

Received on Thursday, 17 November 2022 19:38:54 UTC