Re: drop referentially opaque semantics in embedded triples from thomas lörtsch on 2021-05-13 (public-rdf-star@w3.org from May 2021)

From: thomas lörtsch <tl@rat.io>
Date: Thu, 13 May 2021 15:12:01 +0200
To: Andy Seaborne <andy@apache.org>
Cc: public-rdf-star@w3.org
Message-Id: <945507E6-2471-43F8-8E6F-E0A96900C0E0@rat.io>
> On 12. May 2021, at 01:56, Andy Seaborne <andy@apache.org> wrote:
> 
> 
> 
> On 11/05/2021 12:49, thomas lörtsch wrote:
>>> On 10. May 2021, at 19:05, Andy Seaborne <andy@apache.org> wrote:
>>> 
>>> On 07/05/2021 16:20, Peter F. Patel-Schneider wrote:
>>> 
>>>> An excellent description of the problems with referential opacity in embedded triples.   I agree wholeheartedly.  I particularly like the detailed investigation into the use cases.
>>>> I also note that referential opacity works poorly for the annotation syntax (PG mode).  With referential opacity
>>>> :berlin :population 3644826 {| :determination-method :statistical-updating |}
>>>> does not entail
>>>> :berlin :population 03644826 {| :determination-method :statistical-updating |}
>>>> which I take to be completely unexpected.
>>>> (I purposefully picked an example that did not need any external identity relationship.)
>>> 
>>> The transparency/opaque language isn't that helpful for me;
>> Well, you asked me a few months ago not to invent new terms. So I tried hard to use the terms that belong to the semantcs vocabulary when talking abot the semantics.
>>> it is about binding and scoping.
>> Let’s see if that really get’s to the gist of it.
>>> In a programming language situation, a quoted expression is not the expression. It may capture context when written (lexical/static scoping - capture) or when executed (dynamic scoping).
>>> 
>>> Graph :A asserts
>>> 
>>>    :ship1 :name 'Happy Cat' .
>>> 
>>> Graph :B reads :A and says
>>> 
>>>    << :ship1 :name 'Happy Cat' >> :source :A .
>>>    :ship1 :name 'Happy Cat'
>>>    :ship1 owl:sameAs :ship2 .
>>> 
>>> (yes - external information)
>>> 
>>> Is A the source of ":ship2 :name 'Happy Cat'"?
>>> 
>>> No - that is because later entailment of owl:sameAs was not "in-scope" at :A.
>> Here is already the central issue: the answer could just as well be "Yes", not "No" as you suggest. In fact on the semantic web it should be Yes!
>> In an application it might be No, depending on purpose and inclination. 
> 
> Where does it say ":ship2 :name 'Happy Cat'"?

Ah, okay, sorry! I didn’t properly respond to your question.

> How can I write the proof chain that B can pass on, using transparent references, that says what A asserted and B's additional conclusion?

This "Eaxplainable AI" use cases wasn’t part of our discussions until Pierre-Antoine recently introduced it in the Lotico presentation and I have not thought about or investigated it much. I do however uphold my claim that it is an extreme use case, very close to the metal of RDF inferencing, and like introspection mechanisms in general needs special care. It can’t be the guiding principle to develop a useful semantics for the much more general use cases that RDF-star is conceived, expected and advertised for: appealing n-ary relations, statement annotations without the syntactic verbosity of standard reification, nicely graded complex information objects with a primary relation and secondary attributes (aka property graphs).

That said I immediatly see two possible solutions: use named graphs to collect statements from certain stages of your inferencing pipeline or used the proposed RDF literal [0]. Named graphs can do things with blank nodes - that might be a plus. Literals can’t do that or would at least have to represent the whole set of triples that contain said blank node. OTOH they are extremely harmless and trouble-free: they can’t accidently show up in careless queries or be affected by entailments. That can be a big plus. Also, by their well-understood quote-syntax they are very intuitive to use: it’s very hard to mistake a quote for the real thing. So, to show a little code, just do the following:

 << :ship1 :name 'Happy Cat' >> :hasSource ":ship1 :name 'Happy Cat'"^^rdf:turtle .

And then you go on doing with that information whatever you need to do. As I said I never thought much about this use case and I expect there to be a lot of different ways to tackle it. 
This sure looks verbose, because it actually is - but that is the fault of the embedded triple. Using a proper identifier as RDF/XML has done from the very beginning with the ID attribute would make the whole procedure quite succinct. You may recall that I made that proposal before and I’m quite convinced that no RDF store will implement embedded triples without such an identifier internally. So why not just expose them? Such an identifier could never be misunderstood for the real thing, so the whole question about referential opacity or transparency would not have come up. Instead the two realms are cleanly separated and each can be done justice and handled properly, unambiguously.

> Or by
>> (and don’t forget they [semantics -ed] are meant to be used EVERYWHRE, not just for adding provenance annotations).
> are you rejecting provenance?

What makes you think that? I do however see two aspects of provenance at play here and you are addressing only one of them: one could be interested in who said that the ship is named 'Happy Cat'. Or one could be intereste in which identifier was used to refer to that ship. The former is the sort of questions that the semantic web is concerend about, the latter is an aspect of the machinery behind the scenes. The former can be understood as:

 :A said that the ship is named 'Happy Cat'. 
 (and nobody cares if he pointed at that ship with a stick or his right forefinger)

That variant of provenance is perfectly well handled by referential transparency. I can even sharpen the argument to the point that referential opacity is unable to cover this use case: you can’t talk about the identifier as a syntactic construct and at the same time about what that identifier refers to, what it means. After all ’ship1’ and the ship it refers to are very different things.

That with respect to provenance. The point that I was making is however that even provenance on the level of interpretations is just one aspect of the envisioned use cases of RDF-star and it is negligent to not consider the whole spectrum when designing the semantics. Let’s not forget that embedded triples are envisioned for use in all kinds of complex statements, many of which have nothing to do with provenance and where explicit interest in the syntactic terms would be quite surprising, if not outright bizarre. Property graph use cases for example are mostly not even about referentially transparent provenance but about n-ary relations. In that context referential opacity is just plain wrong.


> Show me how that is done and I will be more comfortable with it.

I have to question your priorities: in my mail starting this thread and also in my last long mail from January [1] I have pointed out certain problems with the proposed semantics. Specifically I poked holes into the vague affirmation that the proposed semantics can be extended to cover referentially transparent use cases - which are predominant on the semantic web to say the least. In my mail starting this thread I asked very specific questions about this problem. These questions haven’t been answered - neither has a solution be provided nor have they been dismissed - and therefor I suppose that I’m not the only one who can’t find a satisfying answer to them. That doesn’t make you uncomfortable? Is explainable AI the only use case you worry about?

> Доверяй, но проверяй (trust but verify)
> 
> The question is "what did A assert?"
> A never mentioned :ship2 at all.
> B can not witness that.
> 
> B can decide what is likes in its graph.
> It's a choice - not prescriptive.
> 
> Define :source as "triple asserted" (like the original RDF*).
> 
> Define :determination-method as "D-entailment permitted"
> 
> It is modelling that can state whether entailment is meaningful.

But what else can such modelling do? So far you’ve only presented variations of the explainable AI use case but haven’t bothered about any of my questions (in the mail opening this thread). Those questions would need to be answered before referential opacity could be considered a useful semantics for the whole range of use cases that RDF-star is conceived, expected and advertized for.

I’m not arguing gainst your use case per se, not at all. But my guess it that an unambiguously specific syntax, likely with quotes, would be the best way to approach it. And I’m very convinced that it shouldn’t be mixed, deeply intertwined even, with any other concern in RDF!


> Annotation syntax is a helper for authoring.
> 
> :paper :author ("X" "Y" "Z") .
> 
> gives the impression
> 
> :paper :author "X" .
> 
> Maybe :author is defined like that or maybe not.

You’ve lost me here.


The real thrust behind RDF-star is not explainable AI, trust management or cryptographically secure signing. The real thrust is the desire to overcome the expressive limitations of the simplistic triple: appealing n-ary relations, well structured primary relations with secondary attributes (aka property graphs), easy annotation without verbose reification syntax. In this context nobody ever talked about the need for referential opacity until the proposed semantics came along. And for good reason, or rather: for lack of any reason.

>    Andy
> 
> (For fun: insert issues of signing linked graphs)

That’s the next WG, and by the way rather a case for graphs than for (embedded) triples. I’ve followed the current thread on semantic-web@w3.org a little and was tempted to ask why they don’t just use literals - canonicalized, hashed, signed, done. But I could manage to keep my mouth shut :-)


Thomas



[0] https://lists.w3.org/Archives/Public/public-rdf-star/2021May/0038.html
[1] https://lists.w3.org/Archives/Public/public-rdf-star/2021Jan/0123.html




>> On the semantic web however we are not talking about identifiers but about the things they identify. That, IIUC, is the whole point that the formal model theoretic semantics is based on. In your example we are talking about a ship. That ship's name is 'Happy Cat', according to :A. The fact that the ship is so named doesn’t depend on the identifier that we use to refer to that ship nor is it of importance if we use :A or another equally correct identifier to refer to the source of that statement.
>> This introduces some wiggle room into the system that depending on our view is either useful or harmful.
>> - One can argue that this wiggle room is what makes the semantic web robust and scalable to a decentralized global information system, as we are not clinging to words and wording when talking about things. That is the stance that the semantic web takes.
>> - One can also argue that this introduces ambiguity which makes conversation harder. And yes, sometimes it does, as identifiers may convey more detail than we are aware (or rather: the sender and the receiver might have different pereceptions on what an identifier identifies).
>> But still the basic design decision - and in my understanding that was well before the logicians entered the space - was that this ambiguity, this flexibility is actually more robust and useful than reliance on specific identifiers, on syntactic agreement, which may be very precise but also is stiff and brittle (and, in decentralized global network probably unattainable).
>> That is how the semantic web works, and the proposed semantics breaks that mechanism in non-obvious ways. Sure, as you show below, that break can be healed. But even if we thought that such effort was a good idea in the first place there is still no principled answer to the very basic questions: How? When? For whom exactly? Those are questions that would need to be answered to bring the proposed semantics in line with how the semantic web works everywhere else around those embedded triples (and don’t forget they are meant to be used EVERYWHRE, not just for adding provenance annotations). It would need to be defined in terms of a formal model theoretic semantics, not in SPARQL, as otherwise we are back to RDF 1.0 in 1999. And it still would have to present some very good reason to justify the disruption.
>> Thomas
>>> The definition of the vocabulary in {| |} will define what can be done with the embedded triple.
>>> 
>>> :determination-method can say that D-entailment is valid.  Or it is natural because ^^xsd:integer is used. Who says :A has D-entailemnt?
>>> 
>>> :source will says "the triple is in the graph".
>>> 
>>> :B can decide what it wants to do.
>>> 
>>> As a building block, we need to keep the item quoted as it was, not with additional implications from outside. This leaves the decision of how to unquote to later.
>>> 
>>> :B may conclude that :A was also saying << :ship2 :name 'Happy Cat' >> but that is a conclusion for :B, not :A.
>>> 
>>> A proof chain that ":ship2 :name 'Happy Cat'" would include
>>> 
>>>  << :ship1 :name 'Happy Cat' >> :saidBy :A .
>>>  << :ship1 owl:sameAs :ship2 . >> :saidBy :B .
>>> 
>>> Including
>>>  << :ship2 :name 'Happy Cat' >> :saidBy :A .
>>> would be wrong.
>>> 
>>> -- 
>>> 
>>> The intuition of D-entailment comes from D-entailment being intuitively universal. Just using ^^xsd:integer implies that D-entailment is in effect.
>>> 
>>> Adding an ontology to data from elsewhere should license conclusions in the scope of the ontology.
>>> 
>>> Transparency is allowing retrospective conclusions
>>> Opaque allows the embedded triple to used for entailment, or not.
>>> 
>>> :B can decide what it wants to do.
>>> 
>>> -- 
>>> 
>>> We aim to provide a system where the use cases can be addressed - that is not the same as directly solving them with one mechanism.
>>> 
>
Received on Thursday, 13 May 2021 13:12:25 UTC