Re: multisets everywhere from thomas lörtsch on 2022-01-08 (public-rdf-star@w3.org from January 2022)

From: thomas lörtsch <tl@rat.io>
Date: Sat, 8 Jan 2022 01:29:50 +0100
To: Doerthe Arndt <doerthe.arndt@tu-dresden.de>
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-Id: <80299E7D-B65B-4B79-A5D3-6A57EA53C746@rat.io>
> Am 07.01.2022 um 17:07 schrieb Doerthe Arndt <doerthe.arndt@tu-dresden.de>:
> 
> Dear Thomas,
> 
> Here is my mail again since I just noticed that I forgot to include the group.
> I hope it is still readable.
>> 
>> I did not manage to answer your mail before my Christmas holidays (sorry for that and happy new year!)

No problem, and happy new year to you too!

>> and now the thread got so long that I did not dare to answer (hopefully all smart things are already said ;)). Nevertheless, I will try now (with the disclaimer that I did not read all of the answers, so I might repeat what other have said):
>> 
>> In general, I see RDF as a language in which you can express knowledge just like natural language (but hopefully with a less ambiguous semantics). You can make statements and if you make them, I expect that you think (or want me to think) that they are true. In that sense RDF is very simple but also very powerful. We can state something and that has a meaning.
>> 
>> With your proposal (if I understand it correctly), you change that meaning and this is what I consider problematic. I try to explain below (and also answer your questions).
>> 
>>> Am 20.12.2021 um 16:40 schrieb thomas lörtsch <tl@rat.io>:
>>> 
>>> 
>>> 
>>>> Am 20.12.2021 um 15:19 schrieb Doerthe Arndt <doerthe.arndt@tu-dresden.de>:
>>>> 
>>>> Dear Thomas,
>>>> 
>>>>> Am 20.12.2021 um 14:32 schrieb thomas lörtsch <tl@rat.io>:
>>>>> 
>>>>> 
>>>>> 
>>>>> Am 20. Dezember 2021 11:47:48 MEZ schrieb Doerthe Arndt <doerthe.arndt@tu-dresden.de>:
>>>>>> Dear Thomas,
>>>>>> 
>>>>>> Before going into full discussion mode again :), I would like to fully understand your proposal, so please allow me one question: 
>>>>>> 
>>>>>> Why do you go for 
>>>>>> 
>>>>>>> :Alice :plays :Guitar .
>>>>>>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>>>>>   :mood :Happy.
>>>>>>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>>>>>   :mood :Moody .
>> 
>> According to RDF semantics  „:Alice :plays :Guitar .“  is a statement which can be true or false (we normally assume that it is true if you state it). If you add triples, this one statement should not change its meaning (monotonicity). But here, it does.

I really don’t get why you think that. Alice playing guitar moodily is still Alice playing guitar. OTOH the statement that Alice plays guitar can hardly be understood as "Alice plays guitar - not one more word about it!".

    :Alice :plays :Guitar .
    << :Alice :plays :Guitar >> :haha :FooledYou! .

would be something else, of course. But regular RDF can do silly stuff too:

    :Alice :plays :Guitar .
    :Alice :doesntKnowHowToPlay :Guitar .


>> If we write: 
>> 
>> :Alice :plays :Guitar .
>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>  :stop 2021.
>> 
>> We added some triples (namely [] :occurrenceOf <<:Alice :plays :Guitar>> ; :stop 2021.) which made that our original triple is not true any more.

I don’t interpret it that way. Maybe we have a problem of the glass of water being half empty or haf full. The statement 

    :Alice :plays :Guitar

is, without a well defined vocabulary from which :plays is taken just an example to make a point. The point that I want to make is that statements can be refined without invalidating them. 

Albeit without providing empirical proof I’m inclined to claim that no term in no vocabulary on the semantic web is defined in a way that could be considered "complete", as in leaving no aspect undefined. 

I also claim that most of them don’t make any claim about temporal validity (or moods for that matter). Such detail is usually left to specialized vocabularies, like e.g. for expressing temporal constraints. 

A lot of information, a lot of meaning is, even on the semantic web, implicit as in any reasonably complex cases it is just impossible to give a complete description.

Considering all that I think that it is reasonable to interpret that

    :Alice :plays :Guitar

has a temporal aspect that is not even mentioned here and almost infinitely many other aspects that are also not mentioned, much less provided.

There are also aspects of common sense: interpreting the above statement as "Alice plays guitar NOW" runs counter the intuition that there usually is a gap between me making an observation, encoding it in RDF, sending the mail to a list that is archieved on the web, soemone else reading it that would make the 'now' claim rather questionable. Also an interpretation that "Alice ALWAYS plays guitar" is not viable in practice etc etc. I think you get my point: there is, for a human reader, all reason to believe that the above statement doesn’t describe some situation in full. But what we do know is that I stated an observation, namely that Alice plays guitar. And I did indeed observe her playing the guitar, that’s the truth (I swear!).

Apart from that my example didn’t involve time but mood but that makes no difference to what I want to express: a single statement can often not capture the whole situation, event or fact that needs to be expressed, and statement annotation is a way to enhance the description.

>> This is in my opinion very problematic. No only from a logical point of view, but also in a very practical sense: You cannot extract simple triples from your graph anymore to work with (make derivations, answer queries, etc.) but you will always have to consider their context (are there triples which make them invalid?). These context checks can get very complicated and in that sense you are creating monster here :).

No, not a monster, but rather normal on the semantic web: it’s the Open World Assumption. The least you are expected to do is to check the immediate surroundings of your statement. Statement annotation actually makes that extremely easy: just check for annotations on the statement that you retrieved. With normal n-ary relations it's much harder to understand where one description ends and the next one starts and therefor much easier to miss something.


>>>>>> 
>>>>>> instead of 
>>>>>> 
>>>>>>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>>>>>   :mood :Happy.
>>>>>>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>>>>>   :mood :Moody .
>>>>>> 
>>>>>> 
>>>>>> with your short cut? 
>>>>>> I am asking because especially with the marriedTo example looks to me like a case where the statement changes its truth value over time (i.e. the triple becomes false if the marriage ends, or could at least become false depending on what „:marriedTo“ means).
>>>>>> 
>>>>>> Maybe I simply missed that point in your previous explanations, so is there a short answer why you personally would model it that way?
>>>>> 
>>>>> It is my understanding of the (informal) property :occurrenceOf that it doesn't assert that statement, just points to it. Isn't that the assumption everybody is working under? 
>>>>> 
>>>> 
>>>> Yes, it is. My question was more on why you want to assert the triple you are talking about even in cases where you know that it is not true t the time you state it.
>> 
>> 
>> 
>> 
>>> 
>>> I gave one reason already in my example: me authoring this in 1967. 

Ahm, that was the example of the marriage between Burton and Taylor, right?

>> I would expect that you remove or change your statement the moment it becomes wrong (like you would for example update wikipedia).

Well, the statement isn’t wrong if you interpret it a bit more liberal: the marriage between Burton and Taylor existed. Like with guitar playing most things in life are not eternal. You will never ever convince all publishers on the semantic web to add every possible detail to their descriptions - it’s just not possible. Adding another detail will however not invalidate what was said before, even if it suddenly makes the thing described less desirable or interesting to me. Say you only search for happy marriages: do you expect the publisher to remove the statement because that marriage didn’t end happily?

>> But even if you don’t, your statement comes in some context (maybe a graph) and you could add meta-information to that. I would not put that information as triples in the same graph since that makes your graph difficult to process (it is in some sense similar to a written text with a lot of footnotes, at some point the meta-information hinders readability).

That is similar to the argument of the monster that you made already above, but now you differentiate additional detail (aka more data) from meta data. That is another aspect. I deliberately used the :moody example to make this not about meta data. It is arguable if data and meta data should be handled in the same way and Pat has almost covinced me that they shouldn’t. One possible solution would be to annotate the predicate when adding more detail to a relation and annotate the whole statement when adding provenance information. Often the vocabularies do the disambiguation so it might not be the most pressing issue. We can go into that - I did a bit in my recent reply to Niklas - but I’m not sure you would be interested. However RDF-star makes no attempt to address individual terms in an ambedded statement so I fear the discussion is a bit pointless around here.

>>> I can easily give more examples:
>>>    <<:Alice :married :Bob>> :in :GretnaGreen .
>>> Some people would accept that as a marriage, others wouldn’t.
>> 
>> That is true. But the moment you state that they are married, you show that in your opinion it is a marriage. 

No, I wouldn’t read it that way. Personally I don’t have a strong opinion on this issue. I record the fact that a marriage betwen Alice and Bob has been declared in Gretna Green. By adding the annotation I give people the chance to decide for themselves if they consider it a real marriage, maybe even the only real marriage that there can be or rather no marriage at all. Again, the beauty of the annoatation syntax is that this additional detail that may change the meaning of the base statement profoundly is not hard to miss. There are of course other possibilities: minting a new sub-property of :marriage (:gretnaGreenMarriage), or describing the whole event as an n-ary relation or an instance of a suitable event class with all the necessary detail. But - and I’m certainly not alone with this intuition - IMO annotated relations are really much more intuitive to write and read and easier to find.

>>>    <<:Alice :plays :Piano>> :with :Hammer .
>>> Some people do that, some people comnsider it art, some people disagree.
>>>   <<:Alice :plays :Guitar>>
>>> alone is according to your approach a highly dubious statement and wrongly modeled because she sure doesn’t play guitar in her sleep. 
>> 
>> That highly depends on what „:plays“ actually means. But again: if you state it, it is true according to you (it is my problem whether or not I want to believe you).
>> 
>>> 
>>> Asking it the other way round: which fact is sure enough to last (and maybe also be undisputed) forever that you would model it as a straightforward triple?  There’s always additional detail, no description is complete - how do you suggest dealing with that? If you model everything as an n-ary relation with a blank node as identifier you are safe. But then you’re essentially back at a glorified version of relational databases (with descriptive column names, granted). Now, I don’t think that graphs are the structure to rule them all but the simplicity of triples, annotated or not, certainly has its appeal. Where does your approach leave that?
>> 
>> As said, I see the knowledge we model as some kind of snapshot of what we believe is true. Of course this concept of „true“ is fluent and what has been true yesterday does not necessarily have to be true tomorrow. But I think that it is the responsibility of the data modeler to at least try to only publish data of which he thinks that it is true and maintain it (i.e. remove triples which are no longer true). 

There is of course a responsibility to keep the semantic web reasonably tidy. Publishing ":Trump :presidentOf :USA" without any further detail and not removing that statement after the last election is kinda lame (well, let’s hope it’s only lame and not on purpose). But OTOH one of the main if not the main value proposition of the semantic web is that it can be built from simple triples that express relations between a A and B. Annotated statements live up to that promise even when the proposition is a bit more complex, whereas all solutions that RDF has to offer - n-ary relations, instantion of specific classes, or the need to define sub-properties (which would be ridiculous when cnsidering time-dependent data) - don’t. You haven’t answered my question how you would prefer to model such complex facts. Also how would you use RDF-star?

>>>> But I guess the answer to that is that you would like to be close to property graphs and there, all triples you  refer to, are also asserted. So I got my answer (if I understood correctly).
>>> 
>>> No, that’s not my point. I want to be able to annotate statements because I see that as an intuitive way to model things, yes. 
>> 
>> OK, then I really misunderstood you, sorry. As pointed out, my problem with your proposal is that you are changing the semantics of what already exists.

Well you haven’t convinced me that I do. I agree that there is a potential for reigning into a triples meaning from its annotation and that would not be good, but I can do that as well with regular triples, can’t I? I don’t see any obvious signs that the danger is more prevalent with statement annotations than with ordinary triples, although I can’t rule that out for sure.

I do however see a problem also with the way that you seem to understand RDF. IMO if you expect and accept only the tightest, narrowest possible interpretation of a triple you can’t get far in practice. Interpreting <:Alice :plays :Guitar> as all there is to say about this leads … nowhere actually. She surely doesn’t do it eternally, forever, so what now: just throw the triple away? What would you accept? Maybe:

    :Alice :plays [ rdf:value :Guitar, :times :Daily, :since 2020, :until :Unknown ]

And then somebody comes along and lists all the things you forgot to mention: location, style, mood, the weather… Or is the blank node the thing that safes you, the infinite malleability of an existential? Well that would just be a cheap trick IMO. 

It is also not hard to define a mapping from annotated statements to n-ary relations. Doesn’t that provide all the RDF-conforming semantics one could wish for?

>> So, asking back: if we go your way, is it then even possible to have triples without annotation? What does:
>> 
>> :Alice :plays :Guitar .
>> 
>> mean?

It describes a situation where she plays guitar. 

Adding to this: as I said above any human reader will understand that she doesn’t necessarily play guitar NOW and certainly not ALWAYS or ETERNALLY. Also any application, service or robot that is well enough designed and coded will have not come to wrong conclusiosn (if any). Some reasonably well worked developped sense AI will "know" that "playing guitar" is a human actvity and as such certain to be constraint in time etc. I’m sure CYC would know it. So where is the problem? What does it mean for you? Why should it mean anything different for me?

>>>> Of course, I disagree that this is a good way to model your examples ;) but I think that has already been discussed in depth on this list.
>>> 
>>> I don’t understand exactly what you refer to. Do you consider the example you gave above, omitting the asserted statements, a good way to model this? That would however leave me wondering if Alice actually played guitar or not.
>>> 
>>> Or would you go a totally different route, eg this:
>>>    :Alice :plays [ rdf:value :Guitar ; :mood :Happy ] .
>>> 
>>> And do you think that RDF-star in geeral shouldn’t be used to model in  property graph style?
>> 
>> I think RDF-star should (amongst other things) be used to model property graph style. I simply don’t want to give up all we have so far to achieve that and I also think that that would be the wrong way. If we do everything exactly like it is done in property graphs even ignoring the existing semantics of RDF, I wonder why do we even need RDF. Why don’t we simply use property graphs in the first place?

Well, "they" do, and some "we" are concerned about market share. I do like RDF and its advantages over property graphs but I also have been longing for more intuitive modelling primitives for a looong time.


So how should we use RDF-star to model property graph style in your opinion? From your comments I only see what you don’t like. Even 

>> :Alice :plays :Guitar .
>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>  :stop 2021.

above is not right in your opinion. What is it then?

Best,
Thomas


>> Or, if you want to have more complex models, why not simply natural language? 
>> 
>> Kind regards,
>> Dörthe
>> 
>>> 
>>> Best,
>>> Thomas
>>> 
>>>> Kind regards,
>>>> Dörthe 
>>>> 
>>>>> Best, 
>>>>> Thomas 
>>>>> 
>>>>>> Kind regards,
>>>>>> Dörthe
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Am 20.12.2021 um 01:31 schrieb thomas lörtsch <tl@rat.io>:
>>>>>>> 
>>>>>>> tl;dr
>>>>>>> RDF semantics is based on sets and RDF-star builds on that. However RDF-star triple annotation has to deal with the practice of RDF, not its theoretical ideal. In RDF as practically employed multisets, although not the norm, can appear almost everywhere. A design that ignores them per default but requires rewriting data and queries when they appear will not fare well in practice. The problem is inherent in the verbosity of the quoted triple identifier: it favors a syntax that is in almost all cases at least risky, if not outright wrong. The shortcut syntax might provide a way out of this dilemma.
>>>>>>> 
>>>>>>> 
>>>>>>> The following examples should illustrate that multisets have to be expected almost everywhere in RDF data. From now on I’m always assuming the standard use case where an actual assertion is annotated:
>>>>>>> 
>>>>>>> #0    :Bob :bought :Car .
>>>>>>>  :RichardB :marriedTo :LizT .
>>>>>>>  :Alice :plays :Guitar .
>>>>>>> 
>>>>>>> 
>>>>>>> The CG report says that 'Alice said that Bob bought a car' should be modeled not as
>>>>>>> 
>>>>>>> #1    <<:Bob :bought :Car>> :said :Alice .
>>>>>>> 
>>>>>>> but as 
>>>>>>> 
>>>>>>> #2    [] :occurrenceOf <<:bob :bought :Car>> ;
>>>>>>>     :said :Alice ;
>>>>>>> 
>>>>>>> because there might be other sources for the same statement. That’s always possible so it seems reasonable to always require the indirection of creating a proper occurrence identifier when annotating a statement with provenance.
>>>>>>> 
>>>>>>> 
>>>>>>> Likewise it was recently discussed that marriages between Richard Burton and Elizabeth Taylor should not be modeled as 
>>>>>>> 
>>>>>>> #3    <<:RichardB :marriedTo :LizT>> :start 1966 .
>>>>>>> 
>>>>>>> but rather as
>>>>>>> 
>>>>>>> #4    [] :occurrenceOf <<:RichardB :marriedTo :LizT>> ;
>>>>>>>     :start 1966 .   
>>>>>>> 
>>>>>>> beacuse we know of that second marriage.
>>>>>>> 
>>>>>>> But what if we didn’t? What if we had authored this in 1967, assuming that this marriage will last forever? Would we have chosen the more involved modelling style nonetheless? And if we did go with the succinct #3 version - very probably, at least according to current thinking I assume - will we later, after their second marriage, have to change that to #4 style? 
>>>>>>> 
>>>>>>> What about querying? Say we are not sure if some statement occurs only once or multiple times: will we have to query for both modelling styles? Probably.
>>>>>>> 
>>>>>>> 
>>>>>>> While the first example could be categorized as describing a speech act and the second example might be considered instantiation there’s also the case of subclassing. For example we might want to describe that Alice happily plays guitar:
>>>>>>> 
>>>>>>> #5    <<:Alice :plays :Guitar>> :mood :Happy .
>>>>>>> 
>>>>>>> The other day however she plays guitar because she's sad:
>>>>>>> 
>>>>>>> #6    <<:Alice :plays :Guitar>> :mood :Gloomy .
>>>>>>> 
>>>>>>> "So which one is it?" the unexpecting data consumer might complain. It turns out that indeed we should have chosen the more involved style right away. 
>>>>>>> And that is precisely my concern: the succinct modelling style as in #1, #3, #5 and #6 only works if we can be _sure_ that we are dealing with triples as types - not occurrences, not instances, not subtypes, not whatever other (not so) special cases there might exist. 
>>>>>>> 
>>>>>>> The succinct triple-as-type style only works for use cases that the proposed semantics was optimized for, when working on the very low levels of RDF machinery. In any other case the succinct style can be used first but might need to be changed later, and it requires queries to account for both modelling styles. Both prospects are bad enough to warrant a general rule that says: don’t use the succinct style, use the indirection via creating a statement identifier if you are not really sure that your use case is Explainable AI, versioning or similiarily close to the metal.
>>>>>>> 
>>>>>>> 
>>>>>>> In my understanding the problem stems from the very core of RDF-star’s design: RDF-star quoted triples are verbose in that they quote in full what they identify. That leads to moral hazard: it’s all too easy to take the shortest path and use the type as an identifier where one should mint a proper identifier first. The proposed semantics take advantage of that verbosity and put it to good use of it for those special use cases that require a carbon copy of their subject. But it is not well suited for annotations that influene the meaning of the annotated triple. Maybe it helps to think about the problem this way: property graph style modelling allows to keep the simple triple and yet enrich it with additional detail. But one must admit that the simple triple annotated in two different ways is then not the same triple anymore. 
>>>>>>> 
>>>>>>> 
>>>>>>> I was all along (summer of 2020 IIRC) arguing for proper statement identifiers like RDF/XML provides them and I still think they are the right solution for mainstream use cases as they are much closer to the reality of RDF data and therefore better positioned to capture deviations from the abstract RDF core. Maybe there is a middle ground in the shortcut syntax which could be defined as expanding to identifiers by default - e.g.:
>>>>>>> 
>>>>>>> :Alice :plays :Guitar {| :mood :Happy |}
>>>>>>> :Alice :plays :Guitar {| :mood :Moody |}
>>>>>>> 
>>>>>>> expanding to
>>>>>>> 
>>>>>>> :Alice :plays :Guitar .
>>>>>>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>>>>>   :mood :Happy.
>>>>>>> [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>>>>>   :mood :Moody .
>>>>>>> 
>>>>>>> This is guaranteed to be correct for single _and_ multiple occurrences alike, it is easy to author per the shorthand syntax and it is unambiguous to query.
>>>>>>> All more involved use cases - explainable AI, unasserted assertions etc - work as before, as intended, using the quoted triple syntax.
>>>>>>> I’d very much favor that default expansion to use a transparency enabling version of :occurrenceOf in which case the shorthand syntax would really be the syntactic sugar for RDF stanard reification that RDF-star was - and, I guess, outside these specialist circles still is - expected to be. That wouldn’t hurt the specialist use cases in any way.
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Thomas
>>>>>>> 
>>>>>>> 
>>>>>>> P.S. w.r.t. "a can of worms": Knowledge representation is indeed a can of worms, and always has been, at least since the old greeks. Statement annotation in RDF is a topic well known to be situated right in the heart of the worm hole. There’s not simple genius way around that.
>
Received on Saturday, 8 January 2022 00:30:16 UTC