Re: multisets everywhere from Patrick J. Hayes on 2021-12-26 (public-rdf-star@w3.org from December 2021)

From: Patrick J. Hayes <phayes@ihmc.org>
Date: Sun, 26 Dec 2021 07:45:05 +0000
To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
CC: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <ED80C971-3C28-4FE1-8453-701BB12907D8@ihmc.org>

I have been trying to not get involved in this discussion, but some things just have to be corrected.

On Dec 23, 2021, at 9:58 AM, Ted Thibodeau Jr <tthibodeau@openlinksw.com<mailto:tthibodeau@openlinksw.com>> wrote:

On Dec 21, 2021, at 03:23 PM, Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu<mailto:pierre-antoine.champin@ercim.eu>> wrote:

In RDF semantics (both the current standard and the proposed RDF-star), a triple is either true or false.

Right. The semantics treats each triple S P O . as an atomic statement, with the logical form P(S, O), ie it asserts that the relationship P holds between the two things S and O. No mention of times or contexts or any way that truth is modified or made temporary.

I believe this is the first time I've known anyone to suggest
that an RDF triple could be (semantically known to be) false.

Of course triples CAN be known to be false. In fact the RDF 1.1 semantics explicitly requires some triples to be false, for example

:S :P. "notAnInteger"^^xsd:integer .

is false when XSD Integer is a recognized datatype. (https://www.w3.org/TR/rdf11-mt/#D_interpretations)

How do you know whether a given triple is false? Or, true?

You know that when asserted (ie published as part of some data) then it is being CLAIMED to be true. That is what 'asserted' means.

My understanding has been that the original conception of RDF
was that it would only be used to record universal and eternal
facts; in other words, everything encoded in RDF was universal
and eternal truth.

I think I know what you mean, but I (and other logicians and philosophers of language) would prefer to say "simply true", or "true simpliciter" if you want to sound fancy. Which means, just true (not, say, necessarily true or mathematically true or scientifically true, etc..) but true without some qualification or modification (possibly true, true now but maybe not tomorrow, somewhat true, conditionally true, etc..) The kind of 'true' that you swear to tell when you take an oath in a court of law.

(This was an immediate problem, because we all hopefully know
that description accuracy requires that those descriptions be
changeable over time

No, we did (and do) not know that. I do not believe this to be the case.

, but it was hard enough for many to grasp
the simplicity of describing everything with SPO triples that
it took years for many to realize that few descriptions were
eternally accurate.)

In what sense? Yes, if you mean that we can discover that we were wrong and that the historical record may need to be corrected, updated (though this is a pretty rare occurrence for most data). No, if you mean that all assertions are somehow time-dependent in the way that tensed language is.

On this basis, even though RDF officially and explicitly operates
under the "Open World" assumption (where anything that is not
stated is implied and should be inferred to be unknown)

Not INFERRED to be unknown. Juat not known, so no inferences should be drawn from such lack of information. But yes, this is an ideal that is often ignored in practice.

, *some*
unasserted values were in practice treated as if they had been
asserted -- i.e., that once inscribed, a triple was now, had
always been, and would always be, accurate.

What triple? You are muddling two issues here: the timeless quality claimed for RDF assertion, and the open world assumption. These are not the same issue.

Operating on this universal and eternal truth assumption, all
graphs in the universe could be combined, and there would be no
contradictions, and all queries should deliver results that are
likewise universally and eternally true.

Well, as true as the assertions were when they were made. RDF, like any logic, cannot guarantee the truth of what it is given as input. It can however guarantee validity, ie that it does not itself insert falsity into inferences.

This belief has been problematic since RDF began, and it is
likely to continue to be so for many years if not forever.

In RDF 1.1, it was explicitly stated that any given graph must
be treated as a snapshot of a universe, just a moment in time

NO!. I have no idea where you got this idea from, but it is completely and absolutely WRONG. There is no such notion of a 'snapshot' anywhere in RDF.

(though still treated as if entirely true about that moment),
and should only be blended (merged, unionized) with other graphs
that described the same moment in time.

The semantics does not support the idea of a graph describing a "moment in time".

The only way to *know* whether any two Named Graphs were about
the same moment in time is for those two Named Graphs to be
explicitly described as such. Often enough, even with this
improvement, two observers who inscribed descriptions that
were accurate from their perspective, included to few details
about what made up their perspective for others to accurately
determine which graphs were from that same perspective, and
which were different. (Just for discussion's sake, consider
two people, one to the north and one to the south of a fire,
describing that fire. The wind was blowing west-to-east, so
smoke could accurately be described as drifting east -- but
the observers described it instead as drifting to the right
in one case and to the left in the other -- and both were
indeed accurate, but neither was *fully* accurate….)

Good example. 'Right' and 'left' (at least used geographically), like 'now' and 'here', 'me' and 'you', are indexicals: their meaning depends on their context of use. Putting indexicals into data that is intended to be transmitted to another place, or stored for later re-use - in fact, into pretty much any data - is a BAD IDEA. This has nothing particularly to do with RDF and triples: it is just a basic rule about how to record information so other people can use it. If you call 911 and they ask you where you are, it is unhelpful to say "here", though it is of course true.

Unfortunately, apparently simple facts about the world can often have an implicit indexical (usually 'now', sometimes some form of 'here') incorporated into them by accident, as it were. As much discussed on thie thread, any two-place relation (like 'is married to') which can change with time (or location) is in fact not the simple two-place relation it appears to be, so it should be encoded as something more complicated when properly used to store real-world data. All of this is kind of data engineering 101 and has been known and discussed for at least a century.

All of which is to say, "This is far more complex than it
appears when we say 'S P O [G]' is all you need to describe
anything!"

What has also been known since before the Internet (actually since around 1890) is that these more complicated things that must be used to encode data can always be built up from a suitably woven graph of binary relational assertions, ie triples. So RDF is universal, in a sense, but not in a trival sense.

The ways of doing it are also well-known.

One is to treat 'marriedTo' (etc.) as having more relational arguments, so the time-period it is asserted 'about' is part of the relationship. One problem with this was noted early on: how many arguments do you need? Consider(not my example): it happened in the kitchen, after midnight, on the ides of March, John did it, with a knife, quickly, with passion…where do you stop needing to add extra arguments? (He made a cheese sandwich, by the way.)

So, the second idea is, you introduce a single 'thing' (variously called an event, a situation, a circumstance, a happening, a history, a process, a proposition, a fact, depending on who you think invented it) that all these 'arguments' are related to by binary relations (sometimes called facets or aspects or cases, if you come to this from linguistics). For example, this idea was developed for use by military intelligence applications, where the core 'aspects' are the five W's: Who, What, When, Where and Why. This approach gets you some nice side benefits, apart from its flexibility, because these 'things' can have other properties. In the military intellegence case, for example, they can be classified into various categories of interest or relevance to some strategic goal. If we are interested in legal issues, they can be related to whatever regulations they violate, and so on.

To make the discussion more muddled, the new-thing-plus-binary-links trick is also widely known as a way to reduce n-ary relations to combinations of binary relations, so some people think of the second way as 'really' being the same as the first way, just in a different notation.

But there is a third idea, which is to treat some smallish subset of the possible arguments as actual arguments, forming a kind of core fact, and the others as meta-assertions ABOUT this core fact. The simplest possible version of this is the core being a single RDF triple, and anything else is about this triple. There are many issues and problems with this approach, though. First, it is basically wrong: these extra arguments or modifications are not 'about' the triple (unlike, say, provenance information).(They might be 'about' the fact asserted by the triple, but that 'fact' is not the triple itself, which is a syntactic entity. In fact, the thing it is about is probably one of those event-things.) But ignoring metaphysics, it is awkward because the 'meta' information changes or modifies the truthvalue of the basic assertion (the plain meaning of the triple), which fucks with the logic. And as a practical matter, it makes it hard to keep data organized, by muddling up different kinds of information. Temporal database theory, for example, makes a sharp distinction between valid time (the time of the facts being true or of the event happening) and transaction time (the time when the data was entered or written), and has invented an entire methodology of not getting these muddled. But treating valid time as a meta assertion about the data is exactly this muddle.

I will let y'all draw your own conclusions about using RDF-star in this way. But please, please, do not think that RDF 1.1 has any kind of temporal indexicality built into its semantics. It doesn't.

Pat Hayes

Be seeing you,

Ted

Received on Sunday, 26 December 2021 07:45:24 UTC