Re: multisets everywhere from Fabio Vitali on 2021-12-25 (public-rdf-star@w3.org from December 2021)

From: Fabio Vitali <fabio.vitali@unibo.it>
Date: Sat, 25 Dec 2021 22:24:49 +0000
To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
CC: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <5DBBC28C-0644-4619-920D-A4E652B58974@unibo.it>
Dear all, 

idle thoughts post-Christmas dinner...

> In RDF 1.1, it was explicitly stated that any given graph must 
> be treated as a snapshot of a universe, just a moment in time
> (though still treated as if entirely true about that moment), 
> and should only be blended (merged, unionized) with other graphs 
> that described the same moment in time.

Time is relevant in the snapshot of the universe in at least three different ways: 

1) because facts change over time: The location of the painting Mona Lisa has been in Florence, then in Amboise, then in Versailles, then in Paris. In 1911 it was stolen and brought back for two years in Florence, then back in Paris from 1914 until now.  

2) because our knowledge about facts change over time: RDF is not about facts, but about our knowledge of facts, and our knowledge is clearly not binary, true XOR false: there are statements that we know are true (very few, actually), there are statements we know are false (very many, and mostly not interesting: Napoleon did not paint Mona Lisa, and similarly didn't Pablo Picasso, Joseph Stalin, my aunt, and more or less 18 billions of other individuals that ever lived on Earth), and everything in between is represented by statements we have various degrees of confidence about, and this confidence totally depends on how much background information we have about them: we are fairly confident about statements that are well corroborated by contextual data (supporting facts, agreeing theories, confirming trail of documents, etc.) and less so about more hypothetical statements that lack supporting data [*]. New experiments, documents or theories can expand and justify new hypothesis that did not exist or were not believed previously: this is not a change in the state of the fact itself, but about our awareness about the fact. 

3) because our representation model may change due to unplanned extensions or improvements of the original requirements, and all data already generated according to the old model need to be updated to the new one: "a marriage is a social structure between one man and one woman and exists after an event called wedding"... er, no, we also must handle divorce, so "a marriage is between one man and one woman after an event called wedding, and before either the death of one of them or another event, called divorce"... er, no. Now we are asked also to consider mormons and muslims, and therefore include polygamy, so it becomes "a marriage is a social construct between one man and one or more women between the wedding and the death or the divorce". Ok? No. Now we are also asked to handle same-sex marriages, so "a marriage is between two or more individuals in place between the wedding and the death or the divorce". Then we must also handle re-marriages such as Burton-Taylor, etc. Nothing changes in the facts, nothing changes in our knowledge of the facts, but something changes in the technical choices we are requested to make over time in the design of our data models.  

So I guess that: 

1) temporal changes in the facts
2) temporal changes in our knowledge of the facts
3) temporal changes in our models for representing our knowledge of the facts

are completely different situations, and being able to provide some best practices for dealing with any of them would be highly useful. 

In particular, expressing statements as non-absolute (neither true nor false, but subject to external constraints such a temporal and/or geographical data) is extremely important, and extremely liberating. I like RDF* star exactly for this. I wish it was possible to do the same for named graphs. 

Ciao

Fabio

--

[*] For instance, we know with some confidence that Mona Lisa was painted by Leonardo da Vinci because we have an abundant trail of documents asserting a continuous stream of identity and location and possession linking the painting that Leonardo painted in 1507 with the one on display now at Louvre, while we do not know with similar certainty that the Salvator Mundi painting is by Leonardo because there is a huge gap in the trail of accompanying documents connecting the painting created by Leonardo in 1510 and the current candidate, since between 1649 and 1979 we have no idea of its whereabouts, and no possible way to recover reliable information about it. 

> On 23 Dec 2021, at 18:58, Ted Thibodeau Jr <tthibodeau@openlinksw.com> wrote:
> 
> On Dec 21, 2021, at 03:23 PM, Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu> wrote:
>> 
>> In RDF semantics (both the current standard and the proposed RDF-star), a triple is either true or false. 
> 
> 
> I believe this is the first time I've known anyone to suggest
> that an RDF triple could be (semantically known to be) false.
> 
> How do you know whether a given triple is false?  Or, true?
> 
> My understanding has been that the original conception of RDF 
> was that it would only be used to record universal and eternal 
> facts; in other words, everything encoded in RDF was universal
> and eternal truth.
> 
> (This was an immediate problem, because we all hopefully know 
> that description accuracy requires that those descriptions be
> changeable over time, but it was hard enough for many to grasp 
> the simplicity of describing everything with SPO triples that 
> it took years for many to realize that few descriptions were 
> eternally accurate.)
> 
> On this basis, even though RDF officially and explicitly operates 
> under the "Open World" assumption (where anything that is not 
> stated is implied and should be inferred to be unknown), *some* 
> unasserted values were in practice treated as if they had been 
> asserted -- i.e., that once inscribed, a triple was now, had
> always been, and would always be, accurate.
> 
> Operating on this universal and eternal truth assumption, all 
> graphs in the universe could be combined, and there would be no 
> contradictions, and all queries should deliver results that are 
> likewise universally and eternally true.
> 
> This belief has been problematic since RDF began, and it is 
> likely to continue to be so for many years if not forever.
> 
> In RDF 1.1, it was explicitly stated that any given graph must 
> be treated as a snapshot of a universe, just a moment in time
> (though still treated as if entirely true about that moment), 
> and should only be blended (merged, unionized) with other graphs 
> that described the same moment in time.
> 
> The only way to *know* whether any two Named Graphs were about 
> the same moment in time is for those two Named Graphs to be 
> explicitly described as such.  Often enough, even with this 
> improvement, two observers who inscribed descriptions that 
> were accurate from their perspective, included to few details 
> about what made up their perspective for others to accurately
> determine which graphs were from that same perspective, and
> which were different.  (Just for discussion's sake, consider
> two people, one to the north and one to the south of a fire,
> describing that fire.  The wind was blowing west-to-east, so
> smoke could accurately be described as drifting east -- but
> the observers described it instead as drifting to the right
> in one case and to the left in the other -- and both were
> indeed accurate, but neither was *fully* accurate....)
> 
> All of which is to say, "This is far more complex than it
> appears when we say 'S P O [G]' is all you need to describe
> anything!"
> 
> Be seeing you,
> 
> Ted
> 
> 
>
Received on Saturday, 25 December 2021 22:25:54 UTC