Re: multisets everywhere from Patrick J. Hayes on 2021-12-28 (public-rdf-star@w3.org from December 2021)

From: Patrick J. Hayes <phayes@ihmc.org>
Date: Tue, 28 Dec 2021 23:45:41 +0000
To: Fabio Vitali <fabio.vitali@unibo.it>
CC: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <C7F889C6-7D42-44A4-9B9E-7B22C74A3EBE@ihmc.org>


> On Dec 26, 2021, at 9:23 AM, Fabio Vitali <fabio.vitali@unibo.it> wrote:
> 
> Hello. 

Hi.

> 
>>> My understanding has been that the original conception of RDF 
>>> was that it would only be used to record universal and eternal 
>>> facts; in other words, everything encoded in RDF was universal
>>> and eternal truth.
>> 
>> I think I know what you mean, but I (and other logicians and philosophers of language) would prefer to say "simply true", or "true simpliciter" if you want to sound fancy. Which means, just true (not, say, necessarily true or mathematically true or scientifically true, etc..) but true without some qualification or modification (possibly true, true now but maybe not tomorrow, somewhat true, conditionally true, etc..) The kind of 'true' that you swear to tell when you take an oath in a court of law. 
> 
> Differently from logicians, I tend to believe that the simply true facts are not frequent outside of axiomatic domains (and I can just hear Gödel muttering that even in axiomatic universes the issue is far from solved...) 

Gödel used the classical notion of truth, but showed it cannot be fully caprtred by any consistent formal system for arithmetic. But he meant 'true' in the simply-true sense. 
> 
> Basically ANY statement you can express is temporally and geographically bound, up to and including "The Sun rises in the East and sets in the West", which only applies to Earth and to the last several billion years. Simply true facts are the exceptions rather than the norm

Wrong. Most data that has been recorded in just about any permanent medium consists of simple facts. Your bank records, for example. Of course the amount in your bank account changes with time, but it is also recorded with times attached, so that the entire fact has three (probably more) components, and THAT fact - that the balance in Fabio Vitali's account (1) at 3:40am GMT on the 25th of November 2020 (2) was such-and-such (3) - is simply true. That was my point: when one adds the required "contextual" information into the fact so as to make it non-indexical, it becomes a simple fact. 

> , and any representation that excludes temporal and geographical constraints from statements is justified by simplification, la
> ziness or irrelevance, not by logic. 

? perhaps we agree. That was exactly my point: that assertions which are time- or place-dependent should include time or place information into their statement, so that they are then no longer so dependent (and are simply true). I think we agree on this, but disagree only on how best to do it. 

> 
>>> , but it was hard enough for many to grasp 
>>> the simplicity of describing everything with SPO triples that 
>>> it took years for many to realize that few descriptions were 
>>> eternally accurate.)
>> 
>> In what sense? Yes, if you mean that we can discover that we were wrong and that the historical record may need to be corrected, updated (though this is a pretty rare occurrence for most data). No, if you mean that all assertions are somehow time-dependent in the way that tensed language is. 
> 
> I disagree. In fact, you are assuming two things in these sentences, and I disagree with both. 
> 
> 2) "No, if you mean that all assertions are somehow time-dependent"
> 
> This whole discussion is about accepting that some, or many (or, I believe, most) assertions "are somehow time-dependent". Our usual :a :marriedTo :b is not an absolute fact, and can only be used wrongly if used in a time-independent fashion: for the majority of the past history of the universe, i.e., before their wedding, :a was NOT married to :b, and for the majority of the remaining history of the universe, i.e., from their divorce onwards, they will also NOT be married. Ignoring it generates problems (how do you differentiate bigamy from remarriage [*]?).

Of course, when you are talking about time-dependent relationships or properties, you need to include temporal information, as the proposition is incomplete without it. So 'A is married to B' is not really a proposition: it has (as I tried to explain) a hidden indexical. If you make it into a complete proposition by including the missing time reference, then it becomes something worth trying to assert and record. 

BTW, it is a proposition if you understand it to mean "A was married to B at some time" or, indeed, "A is married to B <now>" , provided we replace the indexical '<now>' with an actual time-and-date in some recognized coordinate system for referring to times, but neither of these are expressed by a simple triple, by itself. 

> This applies to basically everything. 

Well no, not to EVERYTHING. There are lots of facts (the atomic weight of lead, arithmetic, much historical data, most contents of current databases of population and personal data, engineering data about materials and fabricated parts, astronomical data such as planetary orbits, commonsense facts such as that water is wet, etc..) that do not vary with time in this way. But to many facts about the human everyday world, yes. 

> 
> 1) "Yes, if you mean that we can discover that we were wrong and that the historical record may ne to be corrected"
> 
> You seem to imply that all statements are either right or wrong, and that scholars express everything as true facts until new evidences generates a new truth and the old one is proven wrong. I am afraid that this is not how real scholars work: in most of the fields I deal with, certainties are rare, and it is customary for scholars to express at the same time multiple competing statements as possible, and sometimes advance a personal preference for one of them.

This is muddled. Of course there is extended scholarly debate and, at the edges of our knowledge in any field, disagreements about what is true, expressions of doubt or confidence, debates about evidence and so forth. But all this debate is, ultimately, about what is in fact the case: and that means, which propositions are in fact true. The debates are not (with some exceptions, perhaps, in Continental political theory, theology and quantum physics) about the nature of truth itself, but about what we know to be true. Not only do they not undermine the notion of truth, they depend on it and utilize it. 

> There is a clear and evident need to be able to express not just the facts that are (momentarily) considered true, but also the competing ones that we still recognise (momentarily) as false, yet possible and/or reported in literature. Preventing the representation of (momentarily) rejected statements distorts and limits the correct expression of what we know about any field of human knowledge. 

Sure, but we are here talking about simple factual data, the kind of stuff that appears in almanacs. Not scholarly debate, which would hardly fit into RDF expressivity in any case. 

Are you proposing that scholars use RDF to express all this? That would, I suggest, be a mistake. RDF was not designed to support such nuanced debates. (I might add, not a single commentator suggested that RDF should be so designed during the total of six years of activity by the RDF WGs, when public comments were invited.)

I have helped develop KR systems for use by the intelligence community, who also need to handle uncertain and possibly conflicting reports, and deal with highly incomplete data, and also with fake data used with hostile intent, to mislead. And (unlike most scholars) they really do use sophisticated data handling software to help them. But they do also have a robust notion of truth, because trying to determine what actually happened is often the entire point of the exercise. (And they use far more expressive notations, along the lines of full first-order logic with attached metadata, so n-ary relationships and bnodes do not give them nightmares.)

> 
> [*] ok, ok. Hardcore catholics don't actually differentiate them...
> 
>> So, the second idea is, you introduce a single 'thing' (variously called an event, a situation, a circumstance, a happening, a history, a process, a proposition, a fact, depending on who you think invented it) that all these 'arguments' are related to by binary relations (sometimes called facets or aspects or cases, if you come to this from linguistics). For example, this idea was developed for use by military intelligence applications, where the core 'aspects' are the five W's: Who, What, When, Where and Why. This approach gets you some nice side benefits, apart from its flexibility, because these 'things' can have other properties. In the military intellegence case, for example, they can be classified into various categories of interest or relevance to some strategic goal. If we are interested in legal issues, they can be related to whatever regulations they violate, and so on. 
> 
> An n-ary relationship is just ONE way to represent more complex and time- and location-dependent facts. I think that n-ary relationships have their own share of issues and limits: 

I agree. I believe, if you read my previous message carefully, that I listed three ways and that was one of them. The above paragraph describes the second way. 

> 
> 1) You have to invent a pseudo-entity which becomes the hub of many binary relationships, proliferating the number of entities and classes we create exactly because the model is too simple. Just for persons, you must invent birthEvents, deathEvents, weddingEvents, divorceEvents, jobState, schoolingState, etc. 

I do not regard events as PSEUDO entities, myself. In fact, the world is pretty much comprised of events, in a suitably broad sense. But in any case, you have to have all these things as binary relations each with preferred collections of attached metadata expressed by other binary relations. It's a mess for both of us. 

> 
> 2) You loose the direct connection between the original subject and the original object. You switch from 
> 
> :RichardBurton :marriedTo :LizTaylor 
> 
> to 
> 
> :m1 a Marriage; 
>    :groom :RichardBurton;
>    :bride :LizTaylor;
>    :start "1963"^^xsd:Year;
>    :end "1974"^^xsd:Year. 
> 
> and suddenly there is no more a direct connection between :RichardBurton and :LizTaylor: 
> 
> SELECT ?p WHERE {
>  :RichardBurton ?p :LizTaylor . 
> }
> 
> returns empty. Not nice. 

Well, true. But if you are expecting that second kind of data format, then you would write a different query

SELECT ?e WHERE {
:RichardBurton ?p ?e .
?e ?q :LizTaylor .
}

And then you might discover even more things about the ways their lives intersected, by the way. 
Or, if you were interested in who Maria Callas had married and when, for example, you could query

SELECT ?who ?time WHERE {
:MariaCallas ?p ?e .
?e a :Marriage .
?e ?q ?who .
?e :when ?time .
}


> 
> 3) You have to decide whether to represent the temporally-bound states (a marriageState, an employmentState, a political term, a life) or the boundary events around them (weddingEvent, divorceEvent, hiringEvent, firingEvent, birthEvent, the deathEvent, etc.) and there are no guidelines and in fact we would often use both with no homogeneity or justification (why do we often use terms or reigns, which are States, for politicians and kings, but use births and deaths, which are Events, for human lives, and should we use weddings/divorces or marriages?)

I agree that this (indeed any) more complicated way of expressing truth needs to have a certain accepted discipline about it, in order to be usefully deployed in a Web (indeed, any large) setting. And that this is a major issue for the semantic web (or whatever the currently accepted term is). But this is going to be an issue however we do it. It seems to me that accepting a basic (forgive the word) ontology of time is the best way to do this. It does not need to be complicated. (There are time intervals and time points. Each interval has, and is uniquely determined by, the points at its ends, called respectively the start or beginning, and the finish or end. The time of an event is either an interval or a point. We could  invent a datatype for these things. It may already have been done. Material things can be treated as intervals (their 'lifetime') so their start is variously called their creation, manufacture or birth. And so on, fairly obvious stuff.)

> 
> 4) Sometimes we create classes, sometimes we create n-ary relationships, and it is not clear why. For instance, Wikidata uses classes for geographical constraints, and n-ary relationships for temporal constraints: for instance, in order to say that Gustav III was the king of Sweden between 1771 and 1792, https://www.wikidata.org/wiki/Q52930 defines a class "Monarch of Sweden", which is a subclass of "Monarch" limited to the country "Sweden", and creates a Statement (an n-ary relationship) whereby the position held by Gustav III as Monarch of Sweden is limited between the dates 1771 and 1792. Why the invention of the subclass "Monarch of Sweden" when we already had the temporally limited Statement?

I will entirely agree that Wikidata is a bit of a mess about this issue, and indeed more generally. But as to your last question, I could ask it the other way around: why introduce a temporal modification to the statement when there is a an entity already present which would naturally have temporal information attached? No modification needed to RDF, just some more linked data. 

BTW, I think I know the answer to both questions: because the data was added by two different people (or by software written by two…) who had different ideas of how to represent temporal qualifications to data. Sigh.

> 
>> But there is a third idea, which is to treat some smallish subset of the possible arguments as actual arguments, forming a kind of core fact, and the others as meta-assertions ABOUT this core fact. The simplest possible version of this is the core being a single RDF triple, and anything else is about this triple. There are many issues and problems with this approach, though.
> 
> 
>> First, it is basically wrong: these extra arguments or modifications are not 'about' the triple (unlike, say, provenance information).(They might be 'about' the fact asserted by the triple, but that 'fact' is not the triple itself, which is a syntactic entity. In fact, the thing it is about is probably one of those event-things.)
> 
> You are assuming that the n-ary relationship exists in some abstract sense, while the triple does not

No, I am not assuming that. Of course the triple exists. 

> : that the temporal boundaries can only be be a property of the Marriage, and never a constraint on the truth of the statement.

I am not assuming this, I am suggesting it. And my reason is that most frameworks (logics, databases, RDF graphs, etc.) are built on the assumption that a fully expressed assertion is true, a simple fact, and that any metadata is information ABOUT that record, such as who recorded it, when, where it came from, what evidence there is for it, and so on. But when asserted, it is thereby claimed to be true, and metadata ABOUT it does not change that assertion. (It might alter confidence in it, etc.., but that is a whole other layer of reasoning on top of the basic representation of data.) 

> I think that the opposite view holds just as well: the triple exist absolutely in an indeterminate state, neither true nor false,, and becomes true within a certain temporal or geographical constraint.

So truth itself is time-and location-dependent? But no, it isn't. That is an illusion created by the way that human language is often used, and probably evolved, to make indexical assertions in a 'presentist' sense, to talk about the immediate circumstance of the utterance: the here and now. Yes, natural languages are like this, which is why they use tenses to speak of the past and future. But data 'languages', the stuff in database tables and Wikidata, are not. Or at any rate, should not be.

If you insist on thinking indexically, and require data to be recorded in notations which reflect natural language to this extent, then you really should give this idea some flesh by writing a more precise (I will not say formal) semantics based on this idea, before trying to impose it on the world of linked data. You will not finish up woth RDF, though, and it will not be trivial to do. To do this you will, at a minimum, need to get very precise about what times and locations actually are. And you will have to get very tricky indeed about how to treat assertions about times and places. (If a temporally 'located' sentence is asserted 'at'a time, but another assertion is made about that time at a different time, what does their conjunction mean? What if something is true at a time but is queried at a time /inside/ that time? Who does the part-of reasoning?) How will you represent the truth-conditions for such things as "X happened last year in Marienbad"? And why stop at limes and locations? Some (many? all?) things that happen have other qualities that may be important. Was it legal? Unusual? Singular? Did it happen in a /manner/ worthy of note? What caused it, and what consequences did it have? Were other agents involved, and if so, how? Etc.. Either these are incorporated into this new logic or not, But if not, how can such other aspects of truth be encoded, if there are no events to predicate them of?)

You said 'the triple', I note, not 'the fact'. But if we insist that each triple expresses a simple fact, so it can be asserted without qualification or decoration (and as required by the RDF semantics, which is, of course, normative…) , then we /must/ introduce these other entities. There is simply no other way to remove all the implicit indexicality. If marriage is time-dependent, then :RichardBurton :marriedTo :LizTaylor . is just a mistake: it is not a complete statement, and cannot be said to be either true or false by itself. But if it gets assered, it is required to be true. 

> It has nothing to do with the entity Marriage (which does not exists outside of our minds

I profoundly disagree with that claim. Being married has legal consequences, for example. 

> ), but with the condition for the truth of the statement (which are just as arbitrary and abstract as the concept of Marriage).

> That does not shock me. 

It does not shock me, but I think it is a mis-use of the notion of 'truth'. 
> 
> Remarkably, you do accept that provenance should be about the truth of the triple and not about the marriage.

No, provenance is about the triple as a syntactic object (actually about the assertion, which might span several triples), not about its truth. It might have consequences for our decisions as to its truth, but it is not directly about that. 

> So you are fine with three different modes to express various types of annotations about simple binary relationships: converting  it to an n-ary relationship for some annotations, inventing a hierarchy of variously constrained classes for other ones, quoting it for yet other ones. That confuses me. I 

I did not say I was fine with them all, just noting that they have all been suggested at various times as ways of dealing with the issue. I do have a preference, which should be fairly clear by now. 
> 
> On the contrary, truth conditions about the triple provide a uniform model for temporal, geographical, provenance, confidence constraints, that all use a similar pattern to provide truth conditions for a very minimal binary relationship: 
> 
> <<  :RichardBurton :marriedTo :LizTaylor  >> 
>    :startDate "1963"^^xsd:Year;
>    :endDate "1974"^^xsd:Year;
>    :accordingTo :wikipedia;
>    :confidence 1.0.  
> 
> <<  :GustavIII :positionHeld :monarch  >> 
>    :startDate "1771"^^xsd:Year;
>    :endDate "1792"^^xsd:Year;
>    :for :Sweden;
>    :accordingTo :wikipedia;
>    :confidence 1.0.  

What does  :X :for :Sweden .  mean? (Suppose :X is a triple about a marriage, for example?) And whatever it meant, was THAT also according to Wikipedia? This illustrates the muddle between data and metadata that I mentioned in my last email.

But in any case, as others have noted, this does not allow for repeated or interrupted states of affairs, such as the several marriages of of Dick and Liz, or a judge's status while in recusal. 

> 
> No invented pseudo-entities, no need to choose between states (Marriage) and events (Weddings), no difference of approach between different types of annotations, a direct connection still exists between your original subject and your original object and can be queried in a reasonably simple manner: 
> 
> SELECT ?p WHERE {
>  {
>   :RichardBurton ?p :LizTaylor . 
>  } 
>  UNION
>  {
>   << :RichardBurton ?p :LizTaylor>> ?constraintType ?constraintSource  . 
>  }
> }
> 
> which correctly bounds ?p to :marriedTo, as wished. 

But suppose that the querier is intending to ask, are they married NOW? Will they assume that the simple triple means that they are? (What else could it mean?)

> 
> As a final notation, in many cases we have incomplete or partial temporal records. For instance, we do not know exactly when Leonardo actually painted the Mona Lisa: we only know that it was already painted when he moved to Amboise, in 1516. If we used events, then the creationEvent for Mona Lisa is incomplete and lacking an actual date, which somehow misses the actual point for Events.

Not at all. Of course we may have partial information about anything, indeed this is the normal case. That does not make something "incomplete". But…

> If we use temporal constraints for a simple triple, this can be made correct and true even in the presence of incomplete information: 
> 
> << :monaLisa dc:creator :leonardo >>
>  :startDate "1516"^^xsd:Year.   
> 
> The temporal constraint is about the truth of the triple, and not about a creationEvent which I did not use, and is therefore true.  

… is it? What does :startDate mean? Suppose we discover that Leonardo actually painted it in 1513. Surely, in that case, this assertion about start date would be /wrong/. But this discovery should be consistent with the facts we currently have (and which your RDF should express). The right way to say this is, the start date (of the triple's being true, or of the start time of the life of the painting) is some date /earlier/ than 1516, which requires a bnode and some explicit temporal relations. As often with issues of describing time, simple tricks (usually trying to use a non-tensed language as though it was a tensed, presentist language) just don't work. You have to get it right. 
> 
>> But ignoring metaphysics, it is awkward because the 'meta' information changes or modifies the truthvalue of the basic assertion (the plain meaning of the triple), which fucks with the logic.
> 
> But why should it fuck with the logic? These triples are NOT TRUE outside their temporal and geographical boundaries. Pretending they are true fucks with the logic. 

Because the semantics of truth they are required to conform to (and which is used by virtually all reasoners ever created, not just for RDF) does not recognize this kind of 'indexed truth' where a sentence is true in some times/places and false in others. And it is not simple to invent a semantic theory and/or a reasoning system based on it which does. 

I have actually tried this, by the way, so I speak from experience. See – but please do not cite – https://www.ihmc.us/users/phayes/Trickledown2004.pdf


Pat

> 
>> I will let y'all draw your own conclusions about using RDF-star in this way. But please, please, do not think that RDF 1.1 has any kind of temporal indexicality built into its semantics. It doesn't. 
> 
> I agree with your comments on indexicality, of course, and will not object to that. 
> 
> Ciao
> 
> Fabio
> 
> 
> --
> 
> Fabio Vitali                            Tiger got to hunt, bird got to fly,
> Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
> Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
> phone:  +39 051 2094872              Man got to tell himself he understand.
> e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
> http://vitali.web.cs.unibo.it/
> 
> 
> 
>
Received on Tuesday, 28 December 2021 23:46:03 UTC