Re: Blank nodes semantics - existential variables? from Aidan Hogan on 2020-06-30 (semantic-web@w3.org from June 2020)

From: Aidan Hogan <aidhog@gmail.com>
Date: Tue, 30 Jun 2020 19:20:47 -0400
To: Jiří Procházka <ojirio@gmail.com>
Cc: semantic-web <semantic-web@w3.org>
Message-ID: <f4f46c1e-5176-2ddd-7b0c-b1b438463836@gmail.com>
Hi Jiří,

On 2020-06-30 18:47, Jiří Procházka wrote:
> Aidan, thank you very much. For a long time I've just accepted blank
> nodes need to be in RDF semantics for reasons I don't understand as I
> (still) prefer using generated IRIs instead for most use cases, but
> you've clearly explained the reasons and the use cases where they are
> necessary.

Great to hear! :)

> Adding to the explanation I think I'm not being wrong in saying that for
> both situations the locality and existentiality of blank node give the
> information a sort of immutability, safety against modification of the
> intended meaning when merging graphs (external data). The other graphs
> cannot modify the list, or narrow down the specifics of who killed Bob,
> while erasing the information that at some point (in some graph) the
> specifics were unknown and we just knew, that someone killed Bob.

Interesting point, yes! This feature can indeed be a useful one, and I 
*believe* it was one of the main reasons why in the RDF serialisation of 
OWL (2) DL, they require using blank nodes in definitions that require 
multiple triples: because, like you say, a well-formed DL definition 
using blank nodes in RDF can be verified for a local document and will 
remain well-formed when RDF documents are merged (aka. ontologies are 
imported). For example, the list of classes in a union definition cannot 
suddenly grow "branches" when merged with another RDF document/ontology.

(In the other direction, going from the other syntaxes of OWL to RDF, 
blank nodes are of course useful to fill in those implicit nodes that 
are needed for n-ary relations and lists.)

In fact, something I didn't think about before, but were it not for 
locality, then the existential semantics would not be possible in the 
same sense, because in the case of:

 :Bob :murderedBy _:b1 , _:b2 .

If _:b1 and _:b2 were not local, this could not be equivalent to:

 :Bob :murderedBy _:b1 .

As we could always later encounter an external document elsewhere with:

 _:b1 :name "Alice" .
 _:b2 :name "Carol" .

And clearly there is a difference between:

 :Bob :murderedBy _:b1 , _:b2 .
 _:b1 :name "Alice" .
 _:b2 :name "Carol" .

And ...

 :Bob :murderedBy _:b1 .
 _:b1 :name "Alice" .
 _:b2 :name "Carol" .

So non-local existential semantics becomes almost ill-defined.

Best,
Aidan

> On 6/30/20 1:33 AM, Aidan Hogan wrote:
>> For what it is worth, we started working on the topic of blank nodes
>> some time ago similarity convinced of the fact that the RDF semantics of
>> blank nodes was unintuitive, and that a better semantics could be found.
>> A couple of papers and several years later, I was/am more or less
>> convinced that the semantics of blank nodes is as it should be in RDF.
>>
>>
>> As a summary:
>>
>> Blank nodes are typically useful in two situations:
>>
>> (1) Implicit nodes: you don't have to name blank nodes but rather blank
>> node labels can be generated automatically. This allows for shortcuts
>> like lists ":abc :has ( a b c ) ." in Turtle.
>>
>> (2) Existential variables: for example ":Bob :murderedBy _:b ."; we know
>> Bob was murdered but we don't know by whom he was murdered.
>>
>>
>> How blank nodes are defined in RDF has two main characteristics:
>>
>> (A) Locality: _:b in document D1 is not the same as _:b in document D2.
>>
>> (B) Existentiality: in the graph ":Bob :murderedBy _:b1 , _:b2 .", this
>> states that "Bob was murdered by someone", "Bob was murdered by
>> someone", which is equivalent to ":Bob :murderedBy _:b1 ."
>>
>>
>> Obviously for (2) we need to define blank nodes existentially, but I
>> will try to argue that this is the best solution even just for (1). So
>> in the context of (1) we'll look at changing (A) locality and (B)
>> existentiality, and see what happens.
>>
>>
>> (A) We could think about removing locality, and make blank node global,
>> but now in the context of (1), a parser has to take care of generating a
>> term that is globally unique, which will require something like a "base
>> IRI" and some non-trivial conventions regarding what to do when parsing
>> something from the same base URI multiple times (also considering that
>> the document might have changed). This would greatly complicate simply
>> parsing an RDF document.
>>
>> (B) We could think about keeping locality and removing existentiality,
>> but let's say we take two RDF graphs G1 and G2 parsed from the same
>> Turtle document:
>>
>>      ":abc :has ( a b c ) ."
>>
>> by two different parsers that generate different blank nodes to
>> represent the list. If blank nodes are not existential, which is to say
>> that if blank nodes denote a resource in a similar manner to IRIs, then
>> we lose the formal relation between G1 and G2, even though they
>> represent the same data; more specifically, we would consider that G1
>> associates :abc with one list, and that G2 associates :abc with a
>> *potentially* different list. To me, this behaviour is undesirable.
>>
>> Under existential semantics, I can say that G1 and G2 are formally
>> equivalent. If I union/merge G1 and G2, the existential semantics tells
>> me that in the resulting graph, there exists one list, not that :abc
>> potentially has two lists, allowing me to "lean" the graph and keep just
>> one list (if I want).
>>
>> Put more simply perhaps, without an existential semantics of blank
>> nodes, every time I parse a document and generate different blank nodes,
>> I would be "creating" resources that are potentially different each time
>> to serve as the referents of the blank nodes.
>>
>> If I'm not okay with blank nodes being existentials, there is still the
>> option of simply (<- no pun intended) not leaning the data, and/or of
>> using skolemisation to generate IRIs to replace blank nodes.
>>
>> Best,
>> Aidan
>>
>> On 2020-06-29 18:33, thomas lörtsch wrote:
>>>
>>>
>>>> On 29. Jun 2020, at 21:57, Antoine Zimmermann
>>>> <antoine.zimmermann@emse.fr> wrote:
>>>>
>>>> Le 29/06/2020 à 20:33, thomas lörtsch a écrit :
>>>>>> On 23. Jun 2020, at 14:10, Eric Prud'hommeaux <eric@w3.org> wrote:
>>>>>>
>>>>>> On Tue, Jun 23, 2020 at 01:11:32PM +0200, Antoine Zimmermann wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 21/06/2020 à 15:35, angin scribe a écrit :
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Is the standard semantics of blank nodes in RDF still the same as
>>>>>>>> existentially quantified variables?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>
>>>>>>>> Let "_:b1" and "_:b2" be blank nodes, In the current standard
>>>>>>>> semantics
>>>>>>>> of RDF, is it still true that the graph below does not
>>>>>>>> necessarily mean
>>>>>>>> that Bob has two different things?
>>>>>>>>
>>>>>>>> Bob has _:b1
>>>>>>>> Bob has _:b2
>>>>>>>
>>>>>>> Indeed. This graph would be equivalent to saying:
>>>>>>>
>>>>>>> "Bob has something. Bob has something."
>>>>>>>
>>>>>>> We can't conclude that Bob has 2 things.
>>>>> I’m sorry but this is so frustratingly counter intuitive that I’d
>>>>> like to ask for an explanation: what constraints in the smenatics of
>>>>> RDF make it impossible to provide a tighter definition?
>>>>
>>>> The semantics of RDF is designed to make it possible to express the
>>>> mere notion of existence as in First Order Logic or other logics, by
>>>> the use of blank nodes. Considering that they behave exactly like the
>>>> hundred-year-old existential variables of said logics, they could be
>>>> deemed rather intuitive. Logicians, mathematians, philosophers,
>>>> engineers, computer scientists, have relied on FOL as a basis of tons
>>>> of fundamental concepts.
>>>
>>> I definitely don’t feel like smashing any tables in the temple of FOL.
>>> By any account FOL is very useful and expressive, but it requires
>>> skills and careful crafting - not everybody finds that intuitive. It’s
>>> well known that people struggle with Modus Tollens just as they
>>> struggle with understanding exponential growth rates, no matter how
>>> useful, old etc those concepts are.
>>> However my main point is maybe a bit different. I think it has to do
>>> with the situation of the author, maybe the social dimension of
>>> writing down some triples.
>>>
>>>> What may seem unintuitive is the peculiar representation that RDF
>>>> uses. While in a formula like:
>>>>
>>>> ∃x∃y (has(Bob,x) ∧ has(Bob,y))
>>>>
>>>> it is clear and explicit that x and y are existentially quantified,
>>>> and what the scope of the quantification is,
>>>
>>>> in RDF, on the contrary, quantification is implicit because bnodes
>>>> can only be used existentially (not universally) and their scope is
>>>> always that of the RDF graph under consideration.
>>>
>>>
>>>> If b1 and b2 are blank nodes, then the RDF graph:
>>>>
>>>> { (<Bob>, <has>, b1), (<Bob>, <has>, b2) } is exactly equivalent to a
>>>> FOL formula:
>>>>
>>>> ∃x∃y (Triple(<Bob>,<has>,x) ∧ Triple(<Bob>,<has>,y))
>>>
>>> I had this line in mind and it gives two existentials, x and y. To me
>>> two existentials are intuitively not the same as one existential.
>>>
>>>> which is itself exactly FOL-equivalent to:
>>>>
>>>> ∃z (Triple(<Bob>,<has>,z))
>>>
>>> I didn’t see that coming (and it’s the core of my problem).
>>>
>>>> which is also FOL-equivalent to:
>>>>
>>>> ∃a∃b∃c (Triple(<Bob>,<has>,a) ∧ Triple(<Bob>,<has>,b) ∧
>>>> Triple(<Bob>,<has>,c))
>>>
>>> Or even that. I have only a vague understanding about the reasons for
>>> such simplifications and the consequences they have. I’m sure they are
>>> very useful in logic just like it can be useful in mathematics to get
>>> rid of variables. However...
>>>
>>> ...if I envision a situation where the initial statements "Bob has
>>> something (x)." and "Bob has something (y)." materialize out of the
>>> blue, without futher attributions, leaning x and y to z sure makes
>>> sense and is indeed intuitive. It’s like imagining somebody saying
>>> "Bob has something" over and over again, maybe absent-minded or even
>>> disturbed. All "Bob has something" utterings ammount to the same one
>>> statement.
>>> However a more realistic (or "social", if I may) scenario is that
>>> somebody authored those two statements and added them both to the
>>> graph to express that there are two things that Bob has and that are
>>> worth mentioning. Maybe other statements are to be added later, like
>>> the small and big attributes in your example below. Maybe the
>>> statements are just a beginning, a stub for more to come. That’s the
>>> scenario I had in mind and in that scenario it is not intuitive at all
>>> that the two blank nodes get leaned. It might make me yell at my
>>> computer "What do you think I’m doing here? Do you think I typed those
>>> TWO statements just for fun? etc etc".
>>>
>>> The way that FOL uses existentials is not necessarily the only way
>>> they can be understood and this is where intuition can very well
>>> break. I don’t want to develop too many theories just yet but I
>>> suspect that one could argue that FOL presupposes some conditions that
>>> are not necessarily a given in normal human communications, or even
>>> run counter a normal authoring process like e.g an unfolding text with
>>> place holders and vague but distinct references. And then they become
>>> counter intuitive, no matter how logical and sound they are within the
>>> closed system of FOL. This is not to say that one way is more right
>>> than the other. Logic has certain powers just as composing, but they
>>> have different rules - and they may clash when RDF is authored.
>>>
>>>> An existential variable (or a blank node) does not identify anything.
>>>> It only mentions the existence of a thing. If I say that there exist
>>>> a person that lived more than 10 years, I'm not referring to anyone
>>>> in particular. I'm just stating the existence of such a thing.
>>>
>>>> Now, as in FOL, it is necessary to have infinitely many variables,
>>>> because I can qualify more precisely the things of which I'm stating
>>>> the existence. I may say:
>>>>
>>>> "Bob has something big. Bob has something small."
>>>>
>>>> which is not the same as saying:
>>>>
>>>> "Bob has something that is big and small."
>>>>
>>>>
>>>> In RDF, compare:
>>>>
>>>> <Bob> <has> [ <is> <Small> ] .
>>>> <Bob> <has> [ <is> <Big> ] .
>>>>
>>>> and:
>>>>
>>>> <Bob> <has> [ <is> <Small>, <Big> ] .
>>>>
>>>> In the first case, I need two blank nodes, because, although the
>>>> second graph entails the first, they are not equivalent. According to
>>>> the first graph, it is still possible that it describes a world where
>>>> a small thing is never big and vice versa.
>>>
>>> Now imagine a secnario where the small/big attributions are not made
>>> _yet_. FOL will lean away what the author might have meant to merely
>>> hint at or explain in more detail later.
>>> I only now realized that in my initial mail I had made the implicit
>>> assumption the example
>>>>>>>> Bob has _:b1
>>>>>>>> Bob has _:b2
>>> is just a starting point to which later statements like
>>>      _:b1 is Small
>>>      _:b2 is Big
>>> may be added.
>>> At that point RDF wouldn’t lean those two bnodes into one anymore, right?
>>>
>>>> Note that in these latter examples, I do not even need a bnode
>>>> identifier, because I merely state the existence of a thing, I do not
>>>> identify anything. But due to the limitation of digital
>>>> representations, we have to serialise every graphs as a string of
>>>> character, such that it becomes necessary, in some cases, to
>>>> introduce back references in the form of bnode identifiers. bnode id
>>>> are not names for things. They are just tools that allow a linear
>>>> representation of arbitrary graphs.
>>>
>>> I (think I) did and do understand what you say about bnodes.
>>>
>>>> If you can draw your graphs on surfaces, you can reuse the same
>>>> symbol all the time for every blank node, such as an empty ellipse of
>>>> constant size.
>>>
>>> The different x/y coordinates on that surface disambiguate the
>>> circles. Circles in different positions stand for different
>>> somethings. The position itself carries no meaning.
>>>
>>>
>>> Thank you for the very thorough explanation! I hope my
>>> counter-argument makes some sense now.
>>> Thomas
>>>
>>>
>>>> --AZ
>>>>
>>>>> My intuition is that two different identifiers point to two
>>>>> different things. I would rather translate the above to:
>>>>>      "Bob has some x-thing. Bob has some y-thing."
>>>>> Sure, we don’t know for certain that x and y are distinct until some
>>>>> statement to that effect is made, but per default different
>>>>> existentials should refer to different things. Otherwise what’s the
>>>>> point in having different extistentials? Why not just one
>>>>> "something" symbol instead of indefinitely many blank nodes?
>>>>> Thomas
>>>>>> Reducing this to "Bob has _:b3" is called "graph leaning".  This is a
>>>>>> behavior of RDF semantics, upon which is built RDFS and OWL, but
>>>>>> interestingly not the SPARQL 1.1 RDFS entailment regime.
>>>>>>
>>>>>> RDF 1.1 Semantics says "Blank nodes are treated as simply indicating
>>>>>> the existence of a thing, without using an IRI to identify any
>>>>>> particular thing." It follows from there that two statements:
>>>>>>    [] a :Barn .
>>>>>>    [] :color :red .
>>>>>> might be talking about the same thing. You can't know without some
>>>>>> inverse functional properties or other application logic. AFAICT,
>>>>>> while
>>>>>> architects may take this in mind when designing data models, no tool
>>>>>> uses RDF semantics on its own. There have been tool chains that
>>>>>> count on
>>>>>> graph leaning, but the ones I saw were stand-alone processing
>>>>>> steps, not
>>>>>> features intrinsic to generic RDF processors.
>>>>>>
>>>>>> SPARQL is really a graph query language; it doesn't do any sort of
>>>>>> graph leaning. `SELECT * { <s> <p> ?o }` will give you two bindings
>>>>>> ┌──────┐    ┌────────┐
>>>>>> │ ?o   │ or │ ?o     │
>>>>>> │ _:b1 │    │ _:abcd │
>>>>>> │ _:b2 │    │ _:efgh │
>>>>>> └──────┘    └────────┘
>>>>>> (Those bindings may have any distinct labels; there's no assurance
>>>>>> that blank node labels are preserved.)
>>>>>>
>>>>>> You could argue that a carefully constructed SPARQL query could allow
>>>>>> you to deduce that the response you got could be leaned, but everyone
>>>>>> I know of who wants counting semantics treats them as distinct
>>>>>> individuals. I think this accounts for 95+% of the work done with RDF.
>>>>>>
>>>>>> RDFS only allows you to infer new stuff so it can't do any sort of
>>>>>> leaning. OWL would allow you to specifically infer that they were the
>>>>>> same individual but it can do that with IRIs as well so there doesn't
>>>>>> seem to be much of an observable difference between them other than
>>>>>> that some parts of OWL axioms require BNodes instead of IRIs to
>>>>>> eliminate the effects of coreferences.
>>>>>>
>>>>>> I guess you could characterize it this way:
>>>>>>
>>>>>> 1. Graph semantics treat BNodes as individuals.
>>>>>>     test: insert { <s> <p> _:a , _b } and find two triples.
>>>>>>
>>>>>> 2. SPARQL (unextended) semantics likewise treat BNodes as individuals.
>>>>>>     test: SELECT * { <s> <p> ?o }
>>>>>>
>>>>>> 3. SPARQL RDF semantics still treat BNodes as individuals.
>>>>>>
>>>>>> 4. RDF Entailment implies lean-able graphs.
>>>>>>
>>>>>> 5. OWL can unify BNodes and IRIs.
>>>>>>
>>>>>>
>>>>>>>> I.e., two syntactically different blank nodes do not necessarily
>>>>>>>> mean
>>>>>>>> that they are two different entities.
>>>>>>>>
>>>>>>>> I know that there has been a lot of discussion on blank nodes in the
>>>>>>>> past, cf. [1, 2, 3]. I just want to make sure that there are no
>>>>>>>> recent
>>>>>>>> changes on the semantics of blank nodes that I missed. Please let me
>>>>>>>> know if I miss some recent updates in this area. Many thanks!
>>>>>>>
>>>>>>> In standardising Web technologies, the W3C is extremely cautious
>>>>>>> about
>>>>>>> backward compatibility. If something was defined in some way in a
>>>>>>> version of
>>>>>>> a W3C standard, it is likely to work the same in later versions.
>>>>>>> Sometimes,
>>>>>>> features get deprecated, but they still work the same, if used.
>>>>>>> Other times,
>>>>>>> features get added, but they do not change the way prior features
>>>>>>> work.
>>>>>>> Obviously, there are exceptions, even in RDF. For instance, the
>>>>>>> way literals
>>>>>>> and datatypes work in RDF 1.1 is different from RDF 1.0, but the
>>>>>>> practical
>>>>>>> consequences are almost insignificant.
>>>>>>>
>>>>>>>
>>>>>>> --AZ
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> A
>>>>>>>>
>>>>>>>> [1] M. Arenas, M. Consens. A. Mallea. Revisiting Blank Nodes in
>>>>>>>> RDF to
>>>>>>>> Avoid the Semantic Mismatch with SPARQL.
>>>>>>>> https://www.w3.org/2009/12/rdf-ws/papers/ws23
>>>>>>>>
>>>>>>>> [2] A. Hogan, M. Arenas, A. Mallea, A. Polleres. Everything You
>>>>>>>> Always
>>>>>>>> Wanted to Know About Blank Nodes. Journal of Web Semantics. 2014.
>>>>>>>>
>>>>>>>> [3] A. Mallea, M. Arenas, A. Hogan, A. Polleres. On Blank Nodes.
>>>>>>>> ISWC 2011.
>>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Antoine Zimmermann
>>>>>>> Institut Henri Fayol
>>>>>>> École des Mines de Saint-Étienne
>>>>>>> 158 cours Fauriel
>>>>>>> CS 62362
>>>>>>> 42023 Saint-Étienne Cedex 2
>>>>>>> France
>>>>>>> Tél:+33(0)4 77 42 66 03
>>>>>>> Fax:+33(0)4 77 42 66 66
>>>>>>> http://www.emse.fr/~zimmermann/
>>>>>>> Member of team Connected Intelligence, Laboratoire Hubert Curien
>>>>>>>
>>>>>>
>>>>
>>>>
>>>> -- 
>>>> Antoine Zimmermann
>>>> ISCOD / LSTI - Institut Henri Fayol
>>>> École Nationale Supérieure des Mines de Saint-Étienne
>>>> 158 cours Fauriel
>>>> 42023 Saint-Étienne Cedex 2
>>>> France
>>>> Tél:+33(0)4 77 42 66 03
>>>> Fax:+33(0)4 77 42 66 66
>>>> http://zimmer.aprilfoolsreview.com/
>>>
>>>
>>
>
Received on Tuesday, 30 June 2020 23:21:05 UTC