Re: multisets everywhere from Fabio Vitali on 2021-12-22 (public-rdf-star@w3.org from December 2021)

From: Fabio Vitali <fabio.vitali@unibo.it>
Date: Wed, 22 Dec 2021 16:52:53 +0000
To: Anthony Moretti <anthony.moretti@gmail.com>
CC: thomas lörtsch <tl@rat.io>, Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>, "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <5C00F71B-4A71-42ED-AA37-BD273CC0AA1F@unibo.it>
Dear Anthony,

> Earlier Fabio wrote:
> << :dihydrogen-monoxide :form :liquid >>
>         physical:lowTemp  :0Centigrade; 
>         physical:highTemp :100Centigrade;
>         physical:pressure :1atm. 
> 
> Hi Fabio. If I'm way off forgive me, but these seem to be constraints for description rather than actual description, if so is that what SHACL could be used for? Maybe my understanding is way off though.

Uhm.... maybe I am misreading the multiple uses of Shacl, but it seems to me more a way to express validity constraints over graphs rather than truth constraints. They are not the same thing.

For instance, the graph: 

{ :RichardB :marriedTo :ElizabethT . }

is a valid graph whose truth is temporally constrained (there are two temporal intervals where the statement is true), while:  

{ :FabioVitali :marriedTo :MargotRobbie . }

is just as perfectly valid as the first graph, but, sadly, not true in any past (and, I am ready to bet, future) temporal interval. 

I believe. 

Ciao

Fabio

> 
> Regards
> Anthony
> 
> 
> On Wed, Dec 22, 2021 at 11:07 AM thomas lörtsch <tl@rat.io> wrote:
> 
> 
> > Am 21.12.2021 um 21:23 schrieb Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>:
> > 
> > Hi Thomas,
> > 
> > a few comments:
> > 
> > On 21/12/2021 17:53, thomas lörtsch wrote:
> >> (...)
> >> This is a problem with more than one dimension:
> >> - between asserted and not asserted there are degrees of "asserted under certain conditions", like e.g. the assertion of the marriage between Taylor and Burton only being valid in certain periods
> >> 
> > In RDF semantics (both the current standard and the proposed RDF-star), a triple is either true or false. Any "degree of assertedness" or "conditional assertion" falls outside the standard base semantics. Of course, an application can extend the base semantics (using RDFS/OWL axioms, N3 rules, or hard-coded mechanisms) in order to handle intermediate forms of assertions. But then, in order to be interoperable with other applications, triples that are "conditionnally asserted" from the p.o.v. of the app should be *not* asserted from the p.o.v. of the base semantics.
> 
> Tell that Fabio, not me. If you don’t assert those triples explicitly then in plain RDF they are not asserted at all. Consequently what Dörthe and Fabio propose may of course work as desired in local applications but on the semantic web at large is quite invisible. Which is was what I was trying to explain. But the problem runs deeper and the to-the-book application of logic you propose can’t handle it.
> 
> >> (...)
> >> 
> >> The general assumption under which the semantic web operates is that information is not complete.
> >> 
> > Yes.
> >>  One always has to account for the possibility of additional detail changing the meaning of what one knows already.
> > No, it is actually the opposite! According to RDF's semantics, the meaning of a triple *can not* be altered by another triple.
> 
> I know that but it captures only part of reality. Your fellow logician Dörthe asked me in this thread the other day why I asserted that <:RichardB :marriedTo :ElizabethT> when I already knew that this is not true, or only during a certain timespan. Adding the start and end date does change the meaning of what is asserted already, triples do indeed interact, and even more so when you consider statement annotation or property graph style modelling.
> 
> Or was it sloppy wording on part of the property chosen? But a quick check shows that very few properties in use on the semantic web reflect on their temporal validity. Past, present, future, time periods, whathaveyou - it’s all the same to them. Likewise with spatial, legal or any other dimension of validity of the statement in question. If you now insist that this is not valid RDF I could rethorically ask back "then, what is, out there on the actual semantic web?". Very very little, I assume. And yet it works, because assumptions are made _everywhere_. Otherwise everything would be just too tedious to be even remotely useful.
> 
> We can go back again to n-ary relations and model the marriage like this:
> 
>     :RichardB :marriedTo [
>         rdf:value :ElizabethT ;
>         :start 1966
>     ]
> 
> Now we’re "safe" because the innocent blank node just says 'there exists something'. No triple changes it’s meaning when we add further triples like :end date, :happiness measures or what have you. But all that changes the meaning of what was said. And that is what counts. The formal 'safety' doesn’t carry far. The reasoner may be safe but the human understands something different. The most important trick is to know which modelling patterns are used and adjust our applications (icluding the reasoners) accordingly. And what can be done with n-ary relations can just as well be done with annotated statements.
> 
> The art of logic-backed knowledge representation is to find a compromise between
> - it is too cumbersome to model
> - it doesn’t reflect what we mean
> - the reasoner chokes
> and compromises have to be made on all levels, including logic. If you want to apply a mechanism to reality you have to cope with the complexities of reality. Saying "RDF was not made for this" whenever a real problem surfaces doesn’t cut it. Logic is not very flexible but to a certain degree it is - that is, applications are and the formalism can take that into account. We do it all the time, also - maybe especially - on the semantic web.
> 
> One formally tight way out of this problem would be blank nodes everywhere - subject, predicate and object - and then annotate those. Logically very sound but a usability nightmare nonetheless. Compromises pay off big time if made at the right places. We can use graphs to separate "same" triples and then annotate them in different ways to get multisets. RDF standard reification vocabulary pulls identifiers for occurrences (or "speech acts") out of thin air. Nobody gets hurt, we just play tricks with the logic. That sort of creativity is what’s needed. If named graphs hadn’t been so over-eagerly defined as referentially opaque but had been provided as a mere grouping mechanism - no more, no less - we would have much less problems today. But it’s easy to say in hindsight. Maybe not too late, who knows.
> 
> The trick on which the proposed RDF-star semantics is built is that it talks about mere quotes of triples. And quoted triples are of course triples beaten to death so they don’t move anymore and can only mean exactly themselves. That’s a way to avoid real problems, but in consequence is also largely irrelevant in practice if you don’t allow any wiggle room left or right because you’re so afraid of little worms. If you prefer to stay within the safe zone of logic rigorisity - fine with me. But then I’d prefer that you stop advertising the proposed RDF-star semantics as providing a solution to every meta modelling problem out there, as you did for example in the Lotico talk. Because so far it seems to me that you don’t provide much more to solve those problems than one informal property.
> 
> 
> I had hoped that you would respond to the actual topic of this thread, in the first half of the post that started it. If I’m right then the special treatment that occurrences get in the report should be extended to all annotations that do not conciously annotate a quote, or - more practically speaking - that do not represent an artefact in some processing step of the RDF machinery: a version, a deduction etc. Or, alternatively, face complications like the need to remodel and duplicate queries, as I outlined there. I would find that discussion helpful and, in case I’m right, warranting a reworking of the report.
> 
> OTOH, time and again I’ve sworn myself to not take part in this CG any longer as for my taste it is just not interested enough in finding a good solution for meta modelling on the semantic web. Tediously did this CG need to be motivated to tackle real world issues like occurrences, referentially transparent triples and the like. Whatever this CG really is interested in, and why, is still not really clear to me, but it seems to be quite theoretical. So, the temptation to sit this CG out and wait for a real WG with hopefuly some more practically oriented participants has always been strong. But I also learned a lot.
> 
> Best,
> Thomas
> 
> 
> > This limits the kind of things that you can express with RDF. Things like default values ("if that property is not specified, then its value is considered to be 42") can *not* be captured by the standard semantics.
> > 
> > That does not prevent an agent (natural or artificial) to use some heuristics on top of the standard semantics ("if that property is not specified, then its value is *probably* 42), but such "conclusions" should not be treated as equal to those entailed by the standard semantics -- if only to ensure interoperability with other systems (since other agents may not apply the same heuristics).
> > 
> > Note that from that perspective, I don't see any incompatibility between Anthony's or Fabio's proposals and the "Hitler was not all bad" example you gave (quoted below). Their proposal does not change the "raw semantics" of quoted triples, which still fall on the "unasserted" side of the base RDF semantics, hence are not endorsed by default. But they add some vocabulary-specific rule, e.g. "if a quoted statement has start date before now and end date after now (and no other constraint) then add assert that statement".
> > 
> > (I see another problem with Anthony and Fabio's proposal, and that's precisely the fact that they seem to need default values for the contextual properties... but that's another side of the discussion)
> > 
> >   pa
> > 
> >>  (...)
> >> 
> >> The modelling that you propose makes it harder to realize another goal, the one that I thought fuels the demand for "unasserted assertions": it makes it harder to state something that we don’t endorse. For example a few decades ago it was still not uncommon in Germany to say that "Hitler wasn’t all bad as he had for example built the Autobahn". How do I model that in RDF? Under no circumstance do I want to have a statement saying "Hitler wasn’t all bad" in my triple store. If quoted triples are strictly unasserted, then this is easy:
> >> 
> >>     << :Hitler :not :AllBad >> :because :Autobahn .
> >> 
> >> I might go even further, like:
> >> 
> >>    << << :Hitler :not :AllBad >> :because :Autobahn >>
> >>        :accordingTo :SomeEternallyYesterday .             [0]
> >> 
> >> The thing is: I don’t want the central statement - "Hitler not all bad" - ever to pop up in some unassuming query! How can I do that if, as in your interpretation, quoted statements are not unasserted but asserted under the condition of their annotation?
> >> 
> >> I think that outside of such extreme examples there is a gradual shift between asserted and unasserted: assertions may be conditionalized explicitly through annotations but also unexplicitly through context or through additional statements or through exchanging one node in the triple by a blank node with its own annotations. But the general direction of the semantic web surely is that we offer and ingest data that we believe in, that we find useful, that we want to operate on. So the default assumption is: "this is true (hopefully) (check the small print)". I think that this would also be a useful default assumption for annotated statements. The statement
> >> 
> >>     :RichardB :marriedTo :ElizabethT .
> >> 
> >> says that there exists a :marriedTo relation between those two persons - nothing more, nothing less - and that is indeed true. That relation exists. History is real too. It has some properties, among them that it isn’t in effect today. That statement would be false if the relation had never existed, everything else is fair game. Additional detail
> >> 
> >>     :RichardB :marriedTo :ElizabethT .
> >>     <<:RichardB :marriedTo :ElizabethT>> :start 1966 .
> >> 
> >> is of course always welcome :-)
> >> 
> >> Your style of modelling however says: per default no information can be trusted, we need to know more. Where does that stop? Why are the annotations not quoted themselves? Why is
> >> 
> >>     :dihydrogen-monoxide rdfs:label "ice"
> >> 
> >> a non-absolute statement, only conditionally asserted?
> >> 
> >> How do you _not_ say something?
> >> 
> >> And how is this compatible with all the data out there already? Do you expect everybody to transform their statements into quoted statements?
> >> 
> >> 
> >> W.r.t. graphs: Pierre-Antoine modelled graphs as lists of quoted statements. OTOH: what holds you back to define your own named graph semantics, annotate your named graphs accordingly and be done with it?
> >> 
> >> 
> >> Best,
> >> Thomas
> >> 
> >> 
> >> [0] That’s actually hard to translate. In german we say "ewig Gestrige". If :eternallyYesterday doesn’t make sense then just replace it with :idiots.
> >> 
> >> 
> >> 
> >> 
> >>> Am 21.12.2021 um 11:12 schrieb Fabio Vitali <fabio.vitali@unibo.it>
> >>> :
> >>> 
> >>> Hello,
> >>> 
> >>> 
> >>>> - Start time (assumption if blank: unbounded)
> >>>> - End time (assumption if blank: unbounded)
> >>>> - Location (assumption if blank: unbounded)
> >>>> - Certainty (assumption if blank: 1.0)
> >>>> 
> >>>> 
> >>> In my research team we call them contexts, i.e., conditions that make true a non-absolute statement. We have identified at least seven contexts:
> >>> 
> >>> - Temporal context (a temporal interval within which the statement is true);
> >>> - Spatial context (or, better, jurisdictional context, which allows us to distinguish between, say, the Roman Empire, the Church State, the Italian Kingdom and the current Italian Republic, all of which share at least in part the same location;
> >>> - Part-whole context (e..g. when recording facts about individual pages of a ancient manuscript, then creation date, author and ownership apply to the whole book, and not individual pages);
> >>> - Object-subject context: e.g. when recording facts about a depiction, i.e. a painting or a photograph, being able to distinguish facts about the painting vs. about the subject of the painting (pretty tricky when you have a painting of a painting, or even a photograph of a painting of a painting, ecc.);
> >>> - Provenance context (when you have competing and reciprocally incompatible statements from different sources);
> >>> - Confidence context (when you yourself are considering different and reciprocally incompatible statements with different degrees of confidence about their truth);
> >>> - Physical context (wee later for an example).
> >>> 
> >>> All these contexts are used to create assertions that have the non-absolute statement as subject, and express conditions for their truth. Thus for instance:
> >>> 
> >>> <<:napoleon :role :emperor>>
> >>>     temporal:start "1804-05-18"^^xsd:Date;
> >>>     temporal:end "1814-04-06"^^xsd:Date;
> >>>     jurisdiction:country :FirstFrenchEmpire;
> >>>     confidence:confidence "1.0".
> >>> 
> >>> << :dihydrogen-monoxide :form :solid >>
> >>>     physical:highTemp :0Centigrade.
> >>> 
> >>> << :dihydrogen-monoxide rdfs:label "ice" >>
> >>>     physical:highTemp :0Centigrade.
> >>> 
> >>> << :dihydrogen-monoxide :form :liquid >>
> >>>     physical:lowTemp  :0Centigrade;
> >>>     physical:highTemp :100Centigrade;
> >>>     physical:pressure :1atm.
> >>> 
> >>> << :dihydrogen-monoxide rdfs:label "water" >>
> >>>     physical:lowTemp  :0Centigrade;
> >>>     physical:highTemp :100Centigrade;
> >>>     physical:pressure :1atm.
> >>> 
> >>> << :dihydrogen-monoxide :form :gas >>
> >>>     physical:lowTemp  :100Centigrade.
> >>> 
> >>> << :dihydrogen-monoxide rdfs:label "steam" >>
> >>>     physical:lowTemp  :100Centigrade.
> >>> 
> >>> I find this approach much cleaner and easier to explain to domain experts than requiring them to create an instance of an n-ary relationship relying on some abstract concept, or to invent a new OWL class which is a subclass of some other class, etc. The list can be further and easily expanded to other contexts, if and when we find out we need them.
> >>> 
> >>> Having rdf-star statements available is extremely important because it allows to clearly separate non-absolute statements (that are only true within a given context) from absolute statements (that do not need contexts to be true). And for this purpose, rdf-star is simply perfect: rdf-star triples are non-absolute statements, and plain RDF triples are absolute statements.
> >>> 
> >>> My only problem, as you can see, is that sometimes we need to collect multiple individual statements and associate them to the same context.
> >>> 
> >>> For instance, I want to associate both the form :liquid and the label "water" for the compound :dihydrogen-monoxyde to the conditions "physical temperature between 0 and 100 Centigrades and pressure 1 atmosphere". Right now I had to duplicate the conditions to each of the two non-absolute statements.
> >>> 
> >>> I wish there was a construct in RDF that acts sort of like a... like a container of individual triples! This container could then become the subject of our contexts. Ideally such container would allow us to distinguish between non-absolute statements (that are only true within a given context) from absolute statements (that do not need contexts to be true).
> >>> 
> >>> Oh wait: but one such structure exists in RDF 1.1, it is called named graph, and it provides everything that I need except the distinction between non-absolute and absolute statements!
> >>> 
> >>> GRAPH :ice {
> >>>     :dihydrogen-monoxide :form :solid .
> >>>     :dihydrogen-monoxide rdfs:label "ice" .
> >>> }
> >>> 
> >>> GRAPH :water {
> >>>     :dihydrogen-monoxide :form :liquid.
> >>>     :dihydrogen-monoxide rdfs:label "water".
> >>> }
> >>> 
> >>> GRAPH :steam {
> >>>     :dihydrogen-monoxide :form :gas.
> >>>     :dihydrogen-monoxide rdfs:label "steam".
> >>> }
> >>> 
> >>> :ice   physical:highTemp :0Centigrade.
> >>> :water physical:lowTemp  :0Centigrade;
> >>> :water physical:highTemp :100Centigrade;
> >>> :water physical:pressure :1atm.
> >>> :steam physical:lowTemp  :100Centigrade.
> >>> 
> >>> The problem is that named graphs give me no distinction between containers of absolute statements and containers of non.absolute ones. How I wish there was a symmetry between individual triples (rdf-star vs. rdf triples) and named graphs...
> >>> 
> >>> Ciao
> >>> 
> >>> Fabio
> >>> 
> >>> --
> >>> 
> >>> 
> >>>> More generally, basic temporal logic says that the bounds on any event are the bounds for its subevents and the subevents can be explicitly bounded further. If statements represent relationships and relationships are events then the statement is a subevent of the existence event of both the subject and object, therefore any statement can leave those positions blank but still have bounds, temporal and spatial.
> >>>> 
> >>>> 
> >>>> 
> >>>>> The positions can be left blank if
> >>>>> current assumptions are maintained so that would probably mean most
> >>>>> statements can be left untouched, and if the assumptions are different for
> >>>>> the entire graph they could be stated at the graph level.
> >>>>> 
> >>>> I don't understand. If they can be left blank and consequently not asserted, how are they defaults?
> >>>> 
> >>>> Any reasoner would assume "unbounded" if no values are provided.
> >>>> 
> >>>> 
> >>>> 
> >>>>> Better to start from a principled approach and then see how hard it has to
> >>>>> 
> >>>>>> be tweaked to arrive at a practical solution, accomodate corner cases etc.
> >>>>>> 
> >>>>>> 
> >>>>> Feel like that's what I'm doing, haha.
> >>>>> 
> >>>> If you propose to solve a problem that I describe as a very general one by some examples of seemingly common cases you narrow the scope. That narrowing has to be well understood. Maybe my perspective clouds my judgement but my feeling is that your proposal narrows the scope in quite ad hoc ways that might solve the problem for some special cases (and even there I have my doubts as mentioned above) but leaves a lot or most of them (even equally general ones like authorship) unresolved.
> >>>> 
> >>>> If I'm understanding you correctly, I agree that a referentially opaque relation such as "statementOf" is still needed for provenance use cases etc., is that what you mean when you're talking about authorship? But the need for a referentially transparent relation, and the subsequent confusion that ensues, would be greatly reduced if statements could have start and end time positions.
> >>>> 
> >>>> It also addresses the multiset problem because statements with the same subject, object and relation but different start and end times etc. are different statements.
> >>>> 
> >>>> Regards
> >>>> Anthony
> >>>> 
> >>>> 
> > <OpenPGP_0x9D1EDAEEEF98D438.asc>
>
Received on Wednesday, 22 December 2021 16:53:10 UTC