Re: Some reflections on the semantics of embedded triples from Antoine Zimmermann on 2020-12-01 (public-rdf-star@w3.org from December 2020)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Tue, 1 Dec 2020 17:28:46 +0100
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>, "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <145610eb-9fda-ee59-a4b8-fbc52188d099@emse.fr>
Le 01/12/2020 à 12:58, Pierre-Antoine Champin a écrit :
> Hi Antoine,
> 
> thanks for this detailed review of the proposed semantics.
> 
> On 30/11/2020 15:25, Antoine Zimmermann wrote:
>> In the current semantics, blank nodes that appear in embedded triples 
>> are behaving differently from blank nodes outside. This leads to 
>> peculiarities of semantics with possibly unforseen consequences.
>>
>> TL;DR: In short, blank nodes inside embedded triples can be understood 
>> as "placeholders", as opposed to existentials when they are outside 
>> embedded triples.
> 
> You can say that.
> 
> I would rephrase it like that:
> 
> * blank nodes appearing only in asserted triples (i.e. standard RDF 
> blank nodes) are existential variable ranging over resources from the 
> domain (as they always were)
> 
> * blank nodes in embedded triples are existential variables ranging over 
> ground terms (although if they also appear in asserted triples, the 
> ground term is interpreted in those asserted triples as a resource from 
> the domain)

Actually, I don't think you can rigorously say that bnodes that appear 
inside embedded triples are existential variables. They really have a 
different role.

A ground embedded triple acts as an identifier. << <s> <p> <o> >> is 
really analogous to <http://ex.com/?subject=s&predicate=p&object=o> .
An embedded triple that has a blank node (e.g., << <s> <p> _:b >>) is 
like an IRI template <http://ex.com/?subject=s&predicate=p&object={_:b}>.

While you can informally say that an IRI template contains an 
"existential part", this would be hardly rigorous from the point of view 
of formal logic. It would be like saying that you can quantify portions 
of constant symbols in first order logic.

The existential nature of embedded bnodes only comes from the fact that 
we can use the phrase "there exists" in the meta (natural) language that 
describes the formal language of RDF*. Just like you would say "a 
formula φ is consistant iff *there exists* an interpretation that 
satisfies it".

On the other hand, the bnodes that only appear in asserted triples can 
be formally written as proper existential variables in a direct first 
order representation of RDF* logic. That is, they indicate the existence 
of a thing in the universe of discourse.


This must be pointed out because there may be people who want to use 
RDF* to express the provenance of triples with blank nodes. But with the 
current semantics it cannot be done directly (possibly, one could keep 
track of bnodes using skolemisation, though).


some more answers below.

> 
>>
>>
>> ==Long comments:==
>>
>> One consequence is that it is not possible to directly refer to 
>> triples with blank nodes. For instance, if I say:
>>
>> << _:b <p> <o> >>  onto:source ex:somewhere .
>>
>> it does not mean that a triple with a bnode in subject position is 
>> found somewhere. It means that there is a ground triple following the 
>> template
>>
>> ?s <p> <o>
>>
>> that relates to ex:somewhere via relation onto:source.
> 
> Indeed.
> 
> Note that this is consistent with how blank nodes behave in standard 
> RDF: I can not refer to a blank node used by someone else.
> 
> If file1.ttl contains:
> 
>      ex:alice ex:likes [ ex:name "Bob" ].
> 
> I have no way to say, in another file, something like
> 
>     "That person that Alice likes (according to file1.ttl) is owl:sameAs 
> <http://example.com/users/bob>."

I don't understand the analogy. I'm not talking about files. I talk 
about the semantics of an RDF* graph.
In any case, {ex:alice ex:likes [ ex:name "Bob" ]} does not mean that 
there exists a ground term t (IRI or literal) such that:

ex:alice ex:likes t .
t ex:name "Bob" .

Maybe Alice likes something that is not identified in any way. If the 
universe contains real numbers, then there are uncountably many things 
that cannot be identified.

> (at least not in practive -- even though it is theoretically possible in 
> the abstract syntax)
> 
>> Also, while the triple:
>>
>> << <s> <p> "042"^^xsd:integer >> onto:source  ex:somewhere .
>>
>> must be derived from data that has "042" rather than "42", the triple:
>>
>> << <s> <p> _:b42 >>  onto:source ex:somewhere .
>>
>> can be derived from anything that follows the template:
>>
>> << <s> <p> ?x >>  onto:source ex:somewhere .
> Yes. Because, as Dörthe explained, _:b42 is quantified outside the 
> embedded triple.

It is not even quantified at all, at least not in the sense of variable 
quantification in FOL or similar logics.

>>
>> If we use the "embedded-triple-as-quotation" metaphore, then if we say:
>>
>> Mike says "Joe is in the house". (:mike :says << :joe :in :house >>)
>>
>> we cannot conclude:
>>
>> Mike says "someone is in the house". (Joe replaced by an existential)
> 
> I would argue that Mike didn't use the word "someone", so you should not 
> conclude that.

Yes, that's what I'm saying.

> More formally, you should not conclude
> 
>      Mike says "there exists someone who is in the house"
> 
> because he didn't say that.
> 
>>
>> but we can conclude:
>>
>> Mike says "______ is in the house". (Joe replaced by a placeholder)
> 
> This is also an existential, only quantified outside the whole sentence:
> 
>      The exist someone who Mike says "is in the house"

See my remarks at the top. This existential is only apparent in the meta 
language describing the language of RDF*.


>> This may be a problem for a number of use cases where one wants to 
>> faithfully refer to triples with blank nodes.
> It may. But again, this inability to refer to blank nodes inside someone 
> else's graph is already commonplace in RDF (and one of the main reason 
> why so many people hate them).

It is a very common thing to dump data from the web into quads. Triples 
with bnodes are stored with arbitrary bnode identifiers. It is not a 
problem that in the end, the identifiers are changed. Bnodes are 
interchangeable. The only thing that counts is that different bnodes are 
replaced by different bnodes. It's like referring to a specific euro (or 
dollar, or yen) in your bank account. You can talk about the history of 
a specific euro in your account. When your transfer the first €1 of your 
account to someone else's account, it does not matter that they refer to 
it as the 100th euro in their account. It's still one euro that moved 
from your account to their. In most situation, the precise 
identification of a specific bnode is irrelevant.

>>
>>
>> There are also bizarre (or seemingly bizarre) things happening when we 
>> try to extend the current semantics to more expressive regimes.
>>
>>
>> Some examples that are not necessarily intuitive:
>>
>> # Example 1
>> << <a> <b> "42"^^xsd:integer >> <x> <y> .
>> <s> <p> "042"^^xsd:integer .
>>
>> entails (recognising xsd:integer):
> << <a> <b> "42"^^xsd:integer >> <x> <y> .
> <s> <p> "42"^^xsd:integer .
> 
> which in turns entails
> 
>>
>> << <a> <b> _:x >> <x> <y> .
>> <s> <p> _:x .
> and that does not strike me as shocking. ;)

Sure, you're an editor of the specification! I'd expect that it's not 
surprising to *you*.

>>
>> # Example 2
>> << _:x <b> <c> >> <p> << _:y <b> <c> >> .
>> _:x owl:sameAs _:y .
>>
>> does not entail (in RDFS-plus):
>>
>> << _:x <b> <c> >> <p> << _:x <b> <c> >> .
> 
> If you come back to my interpretation above; _:x and _:y appearing in 
> embedded triples forces them to range over ground terms. The 2nd triple 
> forces those ground terms to denote the same thing, but /not/ to be 
> identical. That's why the resulting triple is not entailed.
> 
> I agree that this is counter-intuitive, though...

It is not necessarily counter-intuitive, but it requires that users of 
RDF* are aware of what is meant by an embedded triple and by a bnode 
inside an embedded triple.

>>
>> # Example 3
>> << <clark> <can> <fly> >> owl:sameAs << <superman> <power> <flight> >> .
>> <clark> a <journalist> .
>>
>> entails (in RDFS-plus):
>>
>> <superman> a <journalist> .
>>
>> but does not entail:
>>
>> << <clark> <can> <fly> >> owl:sameAs << <superman> <can> <fly> >> .
> 
> Yew, that one is indeed bizarre! 8-/
> 
> It is a consequence of the last part of item 6 in 
> https://w3c.github.io/rdf-star/rdf-star-cg-spec.html#definitions . The 
> rationale is that "a given triple states at most one thing" (but the 
> same thing can be stated by different triples).
> 
> I agree that this is counter intuitive, but as Peter, I would be tempted 
> to blame it on owl:sameAs, and its famous tendency to be a foot-shooter.
The problem is, as long as you stay within simple semantics, you have 
many ways of defining the semantics that lead to the same entailments. 
For instance, RDF simple semantics could be defined as follows:

<<<
Interpretations are mappings I from the set of IRIs to sets of pairs in 
(IRIs union Literals). An interpretation satisfies an RDF graph G iff 
there exists a way to replace bnodes in G by IRIs or literals (a 
grounding function f) such that for all triples (s, p, o) in G, 
(f(s),f(o)) in I(p).
 >>>

This yields the same entailments as simple semantics. However, if you 
try to extend it to D-, RDF, RDFS, OWL Full entailment, it fails 
completely. This is why it is necessary to think about entailments 
beyond simple semantics.

> 
> Maybe what you really meant here is rdfx:equivalentStatement, stating 
> that triples /state/ the same thing (i.e. their subject, predicate and 
> object respectively denote the same thing) although they remain 
> /different/ triples? In other words, same extension, different intension?
> 

I meant owl:sameAs because it allows us to think about the identity of 
an embedded triple.


-- 
Antoine Zimmermann
Institut Henri Fayol
École des Mines de Saint-Étienne
158 cours Fauriel
CS 62362
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://www.emse.fr/~zimmermann/
Member of team Connected Intelligence, Laboratoire Hubert Curien
Received on Tuesday, 1 December 2020 16:29:03 UTC