Revisiting referential opacity - a proposal from Thomas Lörtsch on 2023-02-01 (public-rdf-star-wg@w3.org from February 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Wed, 1 Feb 2023 15:12:32 +0100
To: RDF-star WG <public-rdf-star-wg@w3.org>
Message-Id: <56A14A7C-0D6B-4F5C-A53A-9B11D56561EA@rat.io>

Now, before you discuss the handling of blank nodes in referentially opaque quoted triples _again_, why not first reflect on the usefulness and dangers of referentially opacity of quoted triples _again_. And why not consider an alternative!

tl;dr

:Alice :buys :Car .
<< :Alice :buys :Car >> // ref. transparent
:source :Bob ;
:quoted ":Alice :buys :Car."^^rdf:ttl . // RDF literal

I wrote that long mail "drop referentially opaque semantics in embedded triples" [0] to the CG in May 2021 and I stand by my assessment from then: referentially opaque quoted triples will only bring misunderstanding, hurt feelings and large amounts of unsound triples. Their peculiar semantics won’t survive the onslaught of practice. Not even all vendors support them - see Stardog’s comment to the CG that users would like to keep entailments. In practice the Transparency Enabling Property mechanism (TEP) is nowhere mentioned, let alone applied outside the CG report - not even in the CG’s example note. Many users have trouble understanding the concept, or rather the difference between syntax and interpretation. The CG report, the CG example, the CG presentation in the Lotico talk all rather hide the concept instead of actively educating people on it - certainly not the way to make it be respected in practice: a design by logicians for logicians, it seems.

There have been attempts before, in Notation3 and in the original Named Graphs [1] proposal, to define referentially opaque constructs. They failed to garner widespread adoption, but I certainly see the usefulness of the construct, just not in the way that Notation3, Named Graphs (Carroll et al) and now the RDF-star CG propose to implement it: right in the middle of integration-focused, interpretation-based RDF. That way it is sure to be overrun by practitioners who are just looking for a less verbose alterantive to RDF standard reification. In the end everybody looses, because it will be impossible to trust any quoted triple that its authors really meant to insist on the syntactically faithful representation.

It makes a great difference in spoken language if the speaker paints quote signs into the air when citing something, or if the reference is made more loosely, in indirect speach, by interpretation. It does make an even greater difference in a formalised language as RDF, where like in natural language quoting should be used with care and only when really necessary, because it actually narrows done the set of possible interpretations a lot. Such narrowing is not what one usually desires. The foremost use and application of RDF is data-integration on the semantic web. Everything else, everything more involved or specialized should be implemented in more specfic constructs and/or semantic extensions - not the other way round. If integrating referentially transparent embedded triples - speaking naturally, so to say - requires a semantic extension, then the basic operations of RDF are interrupted. Such an architecture is fundamentally flawed: it has it all backwards, and even a nicer alternative to the TEP will not suffice to heal the breach.

But the problem is prefectly solvable, and I agree with the proponents of the proposed semantics that it would be beneficial for everybody to solve it, as it would allow to deal with an important set of niche applications and corner cases. Antoine has, to the CG, already mentioned Graph Literals as proposed by Ivan Herman a decade ago [2]. They provide an easy road to the expressivity desired by Notation3 etc, but don’t get in the way of mainstream use cases. Their biggest usability feature is that they need no explanation: a "quote" will always be understood as a syntactically faithful representation, but not an actual assertion. No further education of users needed.

So, how would it work? Embedded triples are defined to be referentially transparent just like everythig else in RDF, and a further statement records the original in syntactic fidelity:

:Alice :buys :Car .
<< :Alice :buys :Car >> // referentially transparent
:source :Bob ;
:quoted ":Alice :buys :Car ."^^rdf:ttl

Imagine that some mappings for the purpose of data integration are introduced:

:Alice owl:sameAs :alice .
:buys owl:sameAs :purchases .
:Car owl:sameAs :Automobile .

And a possible entailment would be the following:

<< :alice :purchases :Automobile >>
:source :Bob ;
:quoted ":Alice :buys :Car ."^^rdf:ttl

Done. The original syntactic representation is preserved (although, in this example, for no obvious reason).

Nobody who doesn’t care about the exact syntactic representation - which are most use cases - is bothered. Those that do indeed care will only have to add one more triple - a reasonable demand (compare that to what the TEP mechanism requires from ordinary use cases). The ongoings are completely transparent (compare that to the confusion and frustration that the proposed semantics induces, even among this WG).

Of course, query engines need to be able to query those graph literals. I haven’t spent much thought on querying so far, but I can’t see any potential for big problems. Hehe, I know, "what could possibly go wrong" etc, but still.

I see some potential to make this approach even more useful, as graph literals are not restricted to triples. Besides mundane tasks like making assertions about a set of triples this also opens an interesting venue to approach the blank node problem more accurately. The meaning of a blank node is determined by all the statements it takes part in. So recording all those statements (the Concise Bounded Description) to faithfully record a statement that contains a blank node makes a lot of sense - especially when the aim is to record a certain state for purposes like versioning or explainable AI. Graph literals can easily do that whereas RDF-star quoted triples can’t, no matter what semantics is applied to blank nodes.

See for example a popular way to encode an n-ary relation:

:Alice :buys _:x .
_:x rdf:value :Car ;
:state :Used ;
:shape :Bad . // says Bob

It is Bob who points out that the car is in rather bad shape, but Alice likes the car very much and doesn’t agree. This is easy to record, but annotating only the last statement risks loosing context:

<< _:x :shape :Bad >>
:source :Bob .

Even if the blank node is defined to be referentially transparent and the semantics manages to capture a sound set of intuitions, the usefulness of syntactic fidelity of the (as per the CG proposal) referentially opaque quoted triple is somehow limited. The syntactic precision of the CG semantics runs into the void if it can’t capture the state of the other statements adding to the meaning of the blank node. Graph literals can easily provide that context:

<< _:x :condition :Bad >>
:source :Bob ;
:quoted ":Alice :buys _:x .
_:x rdf:value :Car ;
:state :Used ;
:shape :Bad ."^^rdf:ttl

If that jump seems to far, also a two-stage referral process is possible:

<< _:x :condition :Bad >>
:source :Bob ;
:quoted _:y .
_:y rdf:value "_:x :condition :Bad"^^rdf:ttl
:partOf ":Alice :buys _:x .
_:x rdf:value :Car ;
:state :Used ;
:shape :Bad ."^^rdf:ttl

Another obvious application is modal logic: all that's needed from RDF to implement modalities is a way to express statements without actually asserting them, and graph literals provide that. Everything else can be based on properly descriptive ontologies - ":assumes", ":believes", ":rejects" and what have you - that a semantic extension will have to know how to interpret. RDF would be just the messenger of such more elaborate machinery.

What this proposal doesn’t address is the problem of types versus occurrences, which is a not totally but mostly orthogonal issue.

Best,
Thomas

[0] https://lists.w3.org/Archives/Public/public-rdf-star/2021May/0023.html
[1] Carroll, Jeremy J., et al. "Named graphs." Journal of Web Semantics 3.4 (2005): 247-267.
[2] https://www.w3.org/2009/07/NamedGraph.html?this=test#only:me

Received on Wednesday, 1 February 2023 14:12:52 UTC