Re: Blank nodes semantics - existential variables? from Patrick J Hayes on 2020-06-30 (semantic-web@w3.org from June 2020)

From: Patrick J Hayes <phayes@ihmc.us>
Date: Tue, 30 Jun 2020 17:24:14 -0500
To: thomas lörtsch <tl@rat.io>
CC: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Eric Prud'hommeaux <eric@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <68D97857-00C6-49A2-8715-3031B7920445@ihmc.us>
I am getting tired of this interminable discussion, but let me correct a few misunderstandings and then I will just stop responding. 

> On Jun 30, 2020, at 7:55 AM, thomas lörtsch <tl@rat.io> wrote:
> 
> 
> 
>> On 30. Jun 2020, at 08:59, Patrick J Hayes <phayes@ihmc.us> wrote:
>> 
>> 
>> 
>>> On Jun 29, 2020, at 5:33 PM, thomas lörtsch <tl@rat.io> wrote, replying to Antoine Zimmermann:
>>> …
>>> .
>> ...
>>> ... By any account FOL is very useful and expressive, but it requires skills and careful crafting - not everybody finds that intuitive. It’s well known that people struggle with Modus Tollens
>> 
>> Not always. Consider someone saying “If he had set off on time, he would be here by now” when in fact he is not there by then. That is a modus tollens that everyone understands without any effort at all. 
>> 
>>> just as they struggle with understanding exponential growth rates, no matter how useful, old etc those concepts are. 
>>> However my main point is maybe a bit different. I think it has to do with the situation of the author, maybe the social dimension of writing down some triples.
>> 
>> Social?? Read on.
>> 
>>> 
>>>> What may seem unintuitive is the peculiar representation that RDF uses. While in a formula like:
>>>> 
>>>> ∃x∃y (has(Bob,x) ∧ has(Bob,y))
>>>> 
>>>> it is clear and explicit that x and y are existentially quantified, and what the scope of the quantification is, 
>>> 
>>>> in RDF, on the contrary, quantification is implicit because bnodes can only be used existentially (not universally) and their scope is always that of the RDF graph under consideration.
>>> 
>>> 
>>>> If b1 and b2 are blank nodes, then the RDF graph:
>>>> 
>>>> { (<Bob>, <has>, b1), (<Bob>, <has>, b2) } is exactly equivalent to a FOL formula:
>>>> 
>>>> ∃x∃y (Triple(<Bob>,<has>,x) ∧ Triple(<Bob>,<has>,y))
>>> 
>>> I had this line in mind and it gives two existentials, x and y. To me two existentials are intuitively not the same as one existential.
>>> 
>>>> which is itself exactly FOL-equivalent to:
>>>> 
>>>> ∃z (Triple(<Bob>,<has>,z))
>>> 
>>> I didn’t see that coming (and it’s the core of my problem).
>>> 
>>>> which is also FOL-equivalent to:
>>>> 
>>>> ∃a∃b∃c (Triple(<Bob>,<has>,a) ∧ Triple(<Bob>,<has>,b) ∧ Triple(<Bob>,<has>,c))
>>> 
>>> Or even that. I have only a vague understanding about the reasons for such simplifications and the consequences they have. I’m sure they are very useful in logic
>> 
>> Its not a question of utility. The point is that these /mean the same thing/. They have exctly the same truth conditions. 
> 
> I’m not a logician but I know the difference between a definition and a self-evident truth. This here is a definition

No, it is not. Let me explain. The way that truth is described in a logical semantics is in terms of interpretations –possible ways the world could be, if you like, though admittedly these are pretty thin ‘worlds’. To simplify, an interpretation for RDF Is a set of things that URIs name, and a set of binary relationships over those things that URIs can also name (for when they are in the middle position of a triple). Each interpretation constitutes one way to set up a ‘possible world’ that the RDF might describe. Then each triple <A B C> is true in that world just when the relation that B maps to, holds between the things that A and C map to. The way that bnodes are handled is, each is treated just like a name but we allow its mapping to vary over the universe (one thing at a time) and the triple is true when at least one of these mappings makes it come out true by the previous rule. 

Now, this takes a while to describe, but I will claim that is is self-evident, in that it captures exactly what we mean by being true. The triple <A B C> is true (in a particular interpretation) just when B is indeed a relationship that holds between A and C in that order; and the bnode case just means that there is something in the universe such that if you treat the node as naming that thing, then the triple is true. Bnodes are like ‘anonymous names’. That is exactly how words like ’something’ and ’somebody’ work in English. 

So, given this as a basic semantics. the equivalence of 

A B _:x .

with 

A B _:x
A B _:y

follows just by following the rules, without any futher definitions being needed. If the first triple is true, then there must be something which makes it true when you take the bnode_:x to refer to that thing. And making the bnode refer to that same very thing will also make the second triple true. So if one of them is true, they must both be. 

It seems to me that one can see this without even mentioning semantics, though. If you replace a bnode with the word ’something’ and read it intuitively, then the two-triples case is just repetition. You apparently reading the second one as meaning ’something ELSE’, but that interpretation is not warranted by any semantic rule, or even a Gricean conversational rule: because if you had meant ’something else’, why didn't you say that instead of just repeating yourself?

> as part of a logic formalism and so hopefully motivated by utility.

I am quite unconvinced that your intuition here correponds to any broader utility, I am afraid. You seem to be jumbling up a host of different issues, including default reasoning, conversational implicature, textual pragmatics and truth. But only truth matters for a semantics for recording factual data, and if you stick to that then everything works a lot better. See for example what Antoine points out about how OWL/RDF depends on the truthfunctional interpretation of bnodes. 

> 
>>> just like it can be useful in mathematics to get rid of variables. However...
>>> 
>>> ...if I envision a situation where the initial statements "Bob has something (x)." and "Bob has something (y)." materialize out of the blue, without futher attributions, leaning x and y to z sure makes sense and is indeed intuitive. It’s like imagining somebody saying "Bob has something" over and over again, maybe absent-minded or even disturbed. All "Bob has something" utterings ammount to the same one statement. 
>> 
>> But the RDF setting is not like a 2-person conversation. RDF is supposed to be used for putting information all over the Web, from where it can be browsed and combined to allow engines to draw conclusions from data possibly, in fact normally, pulled from many independent sources. So suppose DBpedia says that Bill has something and YAGO also says that Bob has something, agreeing with DBpedia. Would you conclude that Bob must have two things? After all, those two sources are each saying /exactly the same thing/, so why would you think that taken together, they should imply something more? 
> 
> I’m not talking about integrating data from different sources. The example is about two statements in one graph.

So? Of course I can combine information from different sources into one graph. That is pretty much what the semantic web is supposed to be about. 

> 
>>> However a more realistic (or "social", if I may) scenario
>> 
>> Yes, exactly. RDF is not intended for having social conversations. Gricean rules of conversational norms do not apply.
>> 
>>> is that somebody authored those two statements and added them both to the graph to express that there are two things that Bob has and that are worth mentioning.
>> 
>> But what if two people authored them independently? See above.
>> 
>>> Maybe other statements are to be added later, like the small and big attributes in your example below. Maybe the statements are just a beginning, a stub for more to come. That’s the scenario I had in mind and in that scenario it is not intuitive at all that the two blank nodes get leaned. It might make me yell at my computer "What do you think I’m doing here? Do you think I typed those TWO statements just for fun? etc etc". 
>>> 
>>> The way that FOL uses existentials is not necessarily the only way they can be understood
>> 
>> If you can come up with a different semantics, I would be interested to see it. 
>> 
>>> and this is where intuition can very well break. I don’t want to develop too many theories just yet but I suspect that one could argue that FOL presupposes some conditions that are not necessarily a given in normal human communications, or even run counter a normal authoring process like e.g an unfolding text with place holders and vague but distinct references.
>> 
>> These social or narrative uses of natural language certainly involve all kinds of meaning conventions that go beyond FOL, in fact beyond any logic yet formalized by anyone. Check out Gricean implicature for a start:
>> https://plato.stanford.edu/entries/grice/
>> https://plato.stanford.edu/entries/implicature/
>> 
>> But those are the complicated cases with elaborate presuppositions, not the comparative simplicity of FOL. 
>> 
>>> And then they become counter intuitive, no matter how logical and sound they are within the closed system of FOL. This is not to say that one way is more right than the other. Logic has certain powers just as composing, but they have different rules - and they may clash when RDF is authored.
>>> 
>>>> An existential variable (or a blank node) does not identify anything. It only mentions the existence of a thing. If I say that there exist a person that lived more than 10 years, I'm not referring to anyone in particular. I'm just stating the existence of such a thing.
>>> 
>>>> Now, as in FOL, it is necessary to have infinitely many variables, because I can qualify more precisely the things of which I'm stating the existence. I may say:
>>>> 
>>>> "Bob has something big. Bob has something small."
>>>> 
>>>> which is not the same as saying:
>>>> 
>>>> "Bob has something that is big and small."
>>>> 
>>>> 
>>>> In RDF, compare:
>>>> 
>>>> <Bob> <has> [ <is> <Small> ] .
>>>> <Bob> <has> [ <is> <Big> ] .
>>>> 
>>>> and:
>>>> 
>>>> <Bob> <has> [ <is> <Small>, <Big> ] .
>>>> 
>>>> In the first case, I need two blank nodes, because, although the second graph entails the first, they are not equivalent. According to the first graph, it is still possible that it describes a world where a small thing is never big and vice versa.
>>> 
>>> Now imagine a secnario where the small/big attributions are not made _yet_. FOL will lean away what the author might have meant to merely hint at or explain in more detail later. 
>>> I only now realized that in my initial mail I had made the implicit assumption the example
>>>>>>>> Bob has _:b1
>>>>>>>> Bob has _:b2
>>> is just a starting point to which later statements like
>>>  _:b1 is Small
>>>  _:b2 is Big
>>> may be added. 
>> 
>> Then why in God’s name are you making all this fuss? If you have only half-finished writing something then of course it might not mean what you have in mind, yet.
> 
> Half-finished writing is not something so uncommon and the semantic web is all about incomplete data.

Whoa. Yes of course in the sense that the web data is open and can be extended. But not in the sense that it is only half-composed. If I intend to say that Bill has a plan and a canal, using bnodes in each case, but stop and publish my RDF before I get to the bit about plans and canals, then I am just being careless. And I have no right to complain if people simplify my incomplete published repetition.

> You may be well aware how all this fits together, but I’m not. I rather think there’s a gap between the unruly ways in which statements come to life, which is a social activity no matter how many people are involved, and the rules that (FO) logic enforces.

As someone who has spent a fair part of his career in writing and thinking about ontologies written, often by large teams, in logical formalisms, I find it slightly eyebrow-raising that you think that ’the rules that logic enforces’ are a problem. Without such ‘enforced rules’ (I would prefer to say ’structure’)  the entire project would be either impossible or pointless: we could all just write to one another in conversational English and ignore the machines. And as I explained above, this is not some arbitrary ‘rule’: it follows from the basic interpretation of an existential variable. Can you imagine how that interpretation should be altered so as to make your intuition work? I mean, if it really is intuitive, that should be fairly straighforward. 

> To me that gap is not easily recognizable but rather a can of worms, an uneasy feeling that unpleasant surprises lurk behind every next corner. That’s MY problem, but given the frustration that RDF semantics induce in many people maybe it is also A problem. Eric’s list of applications that treat blank nodes the way I intuitively think they should

Did he say that? Sure, many apps do not routinely ‘leanify’ graphs - they are under no obligation to do so, there are few cases and the effort is nontrivial - that is not to say that they are thereby agreeing with your interpretation. They are just being cost-effective. 

> - and in contrast to what the RDF semantics suggest - does indeed hint at a more general situation than just me being a fussy and very mediocre logician (if at all).

OK, I will take this opportunity to emit a general howl of indignation. You can put on earmuffs if you like. 

<HOWL>

Why do people feel that it is a reasonable criticism of RDF to say that it feels too hard or too unintuitive to them because they know nothing about logic? RDF IS a logic, for God’s sake. If you know nothing about the topic, then open a book and read up a bit about it before complaining about the way it doesn’t match your naive untutored intuitions. Is it a reasonable critique of Java that you can't understand it because you never learned to program? Or because you think that "public static void" means “intuitively" that nothing happens when other people are watching? It's not as though RDF is a complicated or obscure logic: it is about as simple a relational logic as one could hope to invent. It has no quantifiers, no negation, no modalities, its BNF syntax would fit on a postage stamp. And if you think that you shouldn’t need to understand the basics of logic and the idea of truth-functional semantics in order to use RDF, well think again. You can’t read structured data in ANY format as though it was some version of English and expect to understand it well enough to deal with it professionally. 

</HOWL>

> 
>> If you stop typing halfway through a URI, the RDF will not make much sense either. 
>> 
>>> At that point RDF wouldn’t lean those two bnodes into one anymore, right?
>> 
>> Right. 
> 
> Thanks. I recently learned that truth in RDF is encoded in the single statement, not in combinations of statements.

Yes. Slightly more exactly, the only form of combination that RDF recognises is simple conjunction.

> Because of that I asked this question. Well, I start to see that leaning one existential away is something else than saying that the satement is false but I still have to fully digest that.  
> 
>>> 
>>>> Note that in these latter examples, I do not even need a bnode identifier, because I merely state the existence of a thing, I do not identify anything. But due to the limitation of digital representations, we have to serialise every graphs as a string of character, such that it becomes necessary, in some cases, to introduce back references in the form of bnode identifiers. bnode id are not names for things. They are just tools that allow a linear representation of arbitrary graphs.
>>> 
>>> I (think I) did and do understand what you say about bnodes.
>>> 
>>>> If you can draw your graphs on surfaces, you can reuse the same symbol all the time for every blank node, such as an empty ellipse of constant size.
>>> 
>>> The different x/y coordinates on that surface disambiguate the circles. Circles in different positions stand for different somethings.
>> 
>> No. They are (not ’stand for’, but actually ARE) different blank nodes, but those do not necessarily refer to different things.
> 
> If this is your definition of "intuitive" then we probably just have to agree to disagree on our intuitions about intuitiveness. Again, I’m not saying it makes no sense and I’m not offering a new semantics for FOL (I find the perspective that Ross Horne provides in another reply very interesting but I’m really not in the position to assess its ramifications). But it contradicts my expectations and apparently I’m not alone. So if it can’t be changed it might at least need a better explanation or/and a more explicit handling in applications.

In my defense, when I wrote the first RDF semantics document (2004) I tried very hard to provide “intuitive” accounts of everything alongside the more exact mathematical-looking versions. That caused far more heated misunderstandings and trouble than it did enlightenment. I concluded that is is a mistake to try to combine the rigor of a specification with any kind of tutorial, so the 2014 version is strictly by the book, written for people who can read that stuff. 

It is very hard to write tutorials which try to explain everything, if people are not willing to read the basics and learn them first. I have seen more exotic ways to misunderstand logical notation in these email forums that I ever did in years of teaching logic 101 to philosophy students. I will be honest: your particular intuition – that bnodes with different bnode labels should be interpreted as not co-denoting by default – is so very extraordinary that I have never come across it before, and would not have thought to provide a warning against that particular misinterpretation.

> 
>> That is what we are arguing about here. 
> 
> Actually I’m not arguing, I’m asking because I’m not understanding. I’m giving examples of how I understand things. Those examples are not meant to be better alternatives to FOL or anything else. I do however reject the notion that FOL is so pretty and perfect and in all other cases just right and therefor intuitive by default. 

Not by default. Defaults are a terrible way to reason. And while FOL is not perfect (of course) it is notable how many proposed alternatives to it have turned out to just be notational variations on FOL. Codd’s semantics of databases is textbook stuff, for example. And there are good deep theoretical reasons why this should be true, as well, but they are a bit technical. There is a nice theorem called Lindström’s theorem which says, basically, that any relational logic which  needs only finitely wide proofs (compactness) and needs only finitely deep proofs (Skolem-Lowenheim), is equivalent to FOL.

OK, I’m done on this thread. 

Pat

> 
> Thomas
> 
> 
>> Pat
Received on Tuesday, 30 June 2020 22:24:35 UTC