Re: Blank nodes semantics - existential variables? from Antoine Zimmermann on 2020-06-29 (semantic-web@w3.org from June 2020)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Mon, 29 Jun 2020 21:57:15 +0200
To: thomas lörtsch <tl@rat.io>, Eric Prud'hommeaux <eric@w3.org>
Cc: semantic-web@w3.org
Message-ID: <6f53a5f3-7d67-16c6-c016-0d2b1db7c52f@emse.fr>
Le 29/06/2020 à 20:33, thomas lörtsch a écrit :
> 
> 
>> On 23. Jun 2020, at 14:10, Eric Prud'hommeaux <eric@w3.org> wrote:
>>
>> On Tue, Jun 23, 2020 at 01:11:32PM +0200, Antoine Zimmermann wrote:
>>>
>>>
>>> Le 21/06/2020 à 15:35, angin scribe a écrit :
>>>> Hi everyone,
>>>>
>>>> Is the standard semantics of blank nodes in RDF still the same as
>>>> existentially quantified variables?
>>>
>>> Yes.
>>>
>>>
>>>> Let "_:b1" and "_:b2" be blank nodes, In the current standard semantics
>>>> of RDF, is it still true that the graph below does not necessarily mean
>>>> that Bob has two different things?
>>>>
>>>> Bob has _:b1
>>>> Bob has _:b2
>>>
>>> Indeed. This graph would be equivalent to saying:
>>>
>>> "Bob has something. Bob has something."
>>>
>>> We can't conclude that Bob has 2 things.
> 
> I’m sorry but this is so frustratingly counter intuitive that I’d like to ask for an explanation: what constraints in the smenatics of RDF make it impossible to provide a tighter definition?

The semantics of RDF is designed to make it possible to express the mere 
notion of existence as in First Order Logic or other logics, by the use 
of blank nodes. Considering that they behave exactly like the 
hundred-year-old existential variables of said logics, they could be 
deemed rather intuitive. Logicians, mathematians, philosophers, 
engineers, computer scientists, have relied on FOL as a basis of tons of 
fundamental concepts.

What may seem unintuitive is the peculiar representation that RDF uses. 
While in a formula like:

∃x∃y (has(Bob,x) ∧ has(Bob,y))

it is clear and explicit that x and y are existentially quantified, and 
what the scope of the quantification is, in RDF, on the contrary, 
quantification is implicit because bnodes can only be used existentially 
(not universally) and their scope is always that of the RDF graph under 
consideration.

If b1 and b2 are blank nodes, then the RDF graph:

{ (<Bob>, <has>, b1), (<Bob>, <has>, b2) } is exactly equivalent to a 
FOL formula:

∃x∃y (Triple(<Bob>,<has>,x) ∧ Triple(<Bob>,<has>,y))

which is itself exactly FOL-equivalent to:

∃z (Triple(<Bob>,<has>,z))

which is also FOL-equivalent to:

∃a∃b∃c (Triple(<Bob>,<has>,a) ∧ Triple(<Bob>,<has>,b) ∧ 
Triple(<Bob>,<has>,c))

An existential variable (or a blank node) does not identify anything. It 
only mentions the existence of a thing. If I say that there exist a 
person that lived more than 10 years, I'm not referring to anyone in 
particular. I'm just stating the existence of such a thing.

Now, as in FOL, it is necessary to have infinitely many variables, 
because I can qualify more precisely the things of which I'm stating the 
existence. I may say:

"Bob has something big. Bob has something small."

which is not the same as saying:

"Bob has something that is big and small."


In RDF, compare:

<Bob> <has> [ <is> <Small> ] .
<Bob> <has> [ <is> <Big> ] .

and:

<Bob> <has> [ <is> <Small>, <Big> ] .

In the first case, I need two blank nodes, because, although the second 
graph entails the first, they are not equivalent. According to the first 
graph, it is still possible that it describes a world where a small 
thing is never big and vice versa.

Note that in these latter examples, I do not even need a bnode 
identifier, because I merely state the existence of a thing, I do not 
identify anything. But due to the limitation of digital representations, 
we have to serialise every graphs as a string of character, such that it 
becomes necessary, in some cases, to introduce back references in the 
form of bnode identifiers. bnode id are not names for things. They are 
just tools that allow a linear representation of arbitrary graphs.

If you can draw your graphs on surfaces, you can reuse the same symbol 
all the time for every blank node, such as an empty ellipse of constant 
size.

--AZ

> 
> My intuition is that two different identifiers point to two different things. I would rather translate the above to:
>  "Bob has some x-thing. Bob has some y-thing."
> Sure, we don’t know for certain that x and y are distinct until some statement to that effect is made, but per default different existentials should refer to different things. Otherwise what’s the point in having different extistentials? Why not just one "something" symbol instead of indefinitely many blank nodes?
> 
> Thomas
> 
> 
>> Reducing this to "Bob has _:b3" is called "graph leaning".  This is a
>> behavior of RDF semantics, upon which is built RDFS and OWL, but
>> interestingly not the SPARQL 1.1 RDFS entailment regime.
>>
>> RDF 1.1 Semantics says "Blank nodes are treated as simply indicating
>> the existence of a thing, without using an IRI to identify any
>> particular thing." It follows from there that two statements:
>>   [] a :Barn .
>>   [] :color :red .
>> might be talking about the same thing. You can't know without some
>> inverse functional properties or other application logic. AFAICT, while
>> architects may take this in mind when designing data models, no tool
>> uses RDF semantics on its own. There have been tool chains that count on
>> graph leaning, but the ones I saw were stand-alone processing steps, not
>> features intrinsic to generic RDF processors.
>>
>> SPARQL is really a graph query language; it doesn't do any sort of
>> graph leaning. `SELECT * { <s> <p> ?o }` will give you two bindings
>> ┌──────┐    ┌────────┐
>> │ ?o   │ or │ ?o     │
>> │ _:b1 │    │ _:abcd │
>> │ _:b2 │    │ _:efgh │
>> └──────┘    └────────┘
>> (Those bindings may have any distinct labels; there's no assurance
>> that blank node labels are preserved.)
>>
>> You could argue that a carefully constructed SPARQL query could allow
>> you to deduce that the response you got could be leaned, but everyone
>> I know of who wants counting semantics treats them as distinct
>> individuals. I think this accounts for 95+% of the work done with RDF.
>>
>> RDFS only allows you to infer new stuff so it can't do any sort of
>> leaning. OWL would allow you to specifically infer that they were the
>> same individual but it can do that with IRIs as well so there doesn't
>> seem to be much of an observable difference between them other than
>> that some parts of OWL axioms require BNodes instead of IRIs to
>> eliminate the effects of coreferences.
>>
>> I guess you could characterize it this way:
>>
>> 1. Graph semantics treat BNodes as individuals.
>>    test: insert { <s> <p> _:a , _b } and find two triples.
>>
>> 2. SPARQL (unextended) semantics likewise treat BNodes as individuals.
>>    test: SELECT * { <s> <p> ?o }
>>
>> 3. SPARQL RDF semantics still treat BNodes as individuals.
>>
>> 4. RDF Entailment implies lean-able graphs.
>>
>> 5. OWL can unify BNodes and IRIs.
>>
>>
>>>> I.e., two syntactically different blank nodes do not necessarily mean
>>>> that they are two different entities.
>>>>
>>>> I know that there has been a lot of discussion on blank nodes in the
>>>> past, cf. [1, 2, 3]. I just want to make sure that there are no recent
>>>> changes on the semantics of blank nodes that I missed. Please let me
>>>> know if I miss some recent updates in this area. Many thanks!
>>>
>>> In standardising Web technologies, the W3C is extremely cautious about
>>> backward compatibility. If something was defined in some way in a version of
>>> a W3C standard, it is likely to work the same in later versions. Sometimes,
>>> features get deprecated, but they still work the same, if used. Other times,
>>> features get added, but they do not change the way prior features work.
>>> Obviously, there are exceptions, even in RDF. For instance, the way literals
>>> and datatypes work in RDF 1.1 is different from RDF 1.0, but the practical
>>> consequences are almost insignificant.
>>>
>>>
>>> --AZ
>>>
>>>
>>>>
>>>> Cheers,
>>>> A
>>>>
>>>> [1] M. Arenas, M. Consens. A. Mallea. Revisiting Blank Nodes in RDF to
>>>> Avoid the Semantic Mismatch with SPARQL.
>>>> https://www.w3.org/2009/12/rdf-ws/papers/ws23
>>>>
>>>> [2] A. Hogan, M. Arenas, A. Mallea, A. Polleres. Everything You Always
>>>> Wanted to Know About Blank Nodes. Journal of Web Semantics. 2014.
>>>>
>>>> [3] A. Mallea, M. Arenas, A. Hogan, A. Polleres. On Blank Nodes. ISWC 2011.
>>>>
>>>
>>> -- 
>>> Antoine Zimmermann
>>> Institut Henri Fayol
>>> École des Mines de Saint-Étienne
>>> 158 cours Fauriel
>>> CS 62362
>>> 42023 Saint-Étienne Cedex 2
>>> France
>>> Tél:+33(0)4 77 42 66 03
>>> Fax:+33(0)4 77 42 66 66
>>> http://www.emse.fr/~zimmermann/
>>> Member of team Connected Intelligence, Laboratoire Hubert Curien
>>>
>>
> 


-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Monday, 29 June 2020 19:57:31 UTC