Re: multisets everywhere

> Am 20.12.2021 um 15:48 schrieb Miel Vander Sande <miel.vandersande@meemoo.be>:
> 
> Hi Thomas, all,
> 
> I do agree that some extra usability on this aspect would definitely not hurt. It would have quite some gains in practice, just like RDF-star does over reification. Having this syntactical shorthand as a middle ground has popped up in my head a couple of times, but I hesitated to ask about it because:
> - it seems very likely that this idea has come up before in the CG. Probably I missed it

I made similar proposals regarding the shorthand syntax before IIRC but only now I have realized how deep the problem with multisets actually runs. That makes the topic yet a bit more urgent IMO. 

> - this only affects the syntaxes, not the RDF-star semantics?

Well, the syntaxes have semantics. Currently the shorthand syntax maps to quoted triples that are also asserted. My proposal for the shorthand syntax doesn’t touch quoted triples as such. They continue to work as intended, especially for use cases like explainable AI, versioning etc that are very close to the machinery. As the shorthand syntax is concise and easy to use quoted triples would also run less risk of being misused for e.g. provenance annotations.

> - it would open up a can of blank nodes, unless you have the ability to also add an identifier

Blank nodes or IRIs would work equally well IIUC - if have however never really understood the black magic that Peter applied when he mapped embedded triples to RDF standard reification quadlets with blank node or IRI subject (and thereby making them referentially transparent or opaque)
 
> - I would not overload the annotation syntax; that has it's own reasons to exist (and it raises questions about assertion ;))

Not sure I understand what you refer to

> - in the end, it's how you can query it that matters most. Can you make such shorthand work for SPARQL-star?

This is a good point! I haven’t investigated SPARQL-star yet, so I don’t know.

> - technically, this is a syntax enhancement that can be defined in a separate specification that extends Turtle-star a. o., but probably you want to stay away from Turtle-star-star

RDF-star defines the syntax and also its semantics. I can define my own variants of :occurrenceOf but I can’t change what the shorthand syntax expands to. I have not looked into other syntaxes, especially not into JSON-LD-star. My thinking is a bit Turtle-centric I’m afraid.

Best,
Thomas

> Best,
> 
> Miel
> 
> Op ma 20 dec. 2021 om 15:20 schreef Doerthe Arndt <doerthe.arndt@tu-dresden.de>:
> Dear Thomas,
> 
> > Am 20.12.2021 um 14:32 schrieb thomas lörtsch <tl@rat.io>:
> > 
> > 
> > 
> > Am 20. Dezember 2021 11:47:48 MEZ schrieb Doerthe Arndt <doerthe.arndt@tu-dresden.de>:
> >> Dear Thomas,
> >> 
> >> Before going into full discussion mode again :), I would like to fully understand your proposal, so please allow me one question: 
> >> 
> >> Why do you go for 
> >> 
> >>>  :Alice :plays :Guitar .
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Happy.
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Moody .
> >> 
> >> instead of 
> >> 
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Happy.
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Moody .
> >> 
> >> 
> >> with your short cut? 
> >> I am asking because especially with the marriedTo example looks to me like a case where the statement changes its truth value over time (i.e. the triple becomes false if the marriage ends, or could at least become false depending on what „:marriedTo“ means).
> >> 
> >> Maybe I simply missed that point in your previous explanations, so is there a short answer why you personally would model it that way?
> > 
> > It is my understanding of the (informal) property :occurrenceOf that it doesn't assert that statement, just points to it. Isn't that the assumption everybody is working under? 
> > 
> 
> Yes, it is. My question was more on why you want to assert the triple you are talking about even in cases where you know that it is not true t the time you state it. But I guess the answer to that is that you would like to be close to property graphs and there, all triples you  refer to, are also asserted. So I got my answer (if I understood correctly). Of course, I disagree that this is a good way to model your examples ;) but I think that has already been discussed in depth on this list.
> 
> Kind regards,
> Dörthe 
> 
> > Best, 
> > Thomas 
> > 
> >> Kind regards,
> >> Dörthe
> >> 
> >> 
> >> 
> >>> Am 20.12.2021 um 01:31 schrieb thomas lörtsch <tl@rat.io>:
> >>> 
> >>> tl;dr
> >>> RDF semantics is based on sets and RDF-star builds on that. However RDF-star triple annotation has to deal with the practice of RDF, not its theoretical ideal. In RDF as practically employed multisets, although not the norm, can appear almost everywhere. A design that ignores them per default but requires rewriting data and queries when they appear will not fare well in practice. The problem is inherent in the verbosity of the quoted triple identifier: it favors a syntax that is in almost all cases at least risky, if not outright wrong. The shortcut syntax might provide a way out of this dilemma.
> >>> 
> >>> 
> >>> The following examples should illustrate that multisets have to be expected almost everywhere in RDF data. From now on I’m always assuming the standard use case where an actual assertion is annotated:
> >>> 
> >>> #0    :Bob :bought :Car .
> >>>    :RichardB :marriedTo :LizT .
> >>>    :Alice :plays :Guitar .
> >>> 
> >>> 
> >>> The CG report says that 'Alice said that Bob bought a car' should be modeled not as
> >>> 
> >>> #1    <<:Bob :bought :Car>> :said :Alice .
> >>> 
> >>> but as 
> >>> 
> >>> #2    [] :occurrenceOf <<:bob :bought :Car>> ;
> >>>       :said :Alice ;
> >>> 
> >>> because there might be other sources for the same statement. That’s always possible so it seems reasonable to always require the indirection of creating a proper occurrence identifier when annotating a statement with provenance.
> >>> 
> >>> 
> >>> Likewise it was recently discussed that marriages between Richard Burton and Elizabeth Taylor should not be modeled as 
> >>> 
> >>> #3    <<:RichardB :marriedTo :LizT>> :start 1966 .
> >>> 
> >>> but rather as
> >>> 
> >>> #4    [] :occurrenceOf <<:RichardB :marriedTo :LizT>> ;
> >>>       :start 1966 .   
> >>> 
> >>> beacuse we know of that second marriage.
> >>> 
> >>> But what if we didn’t? What if we had authored this in 1967, assuming that this marriage will last forever? Would we have chosen the more involved modelling style nonetheless? And if we did go with the succinct #3 version - very probably, at least according to current thinking I assume - will we later, after their second marriage, have to change that to #4 style? 
> >>> 
> >>> What about querying? Say we are not sure if some statement occurs only once or multiple times: will we have to query for both modelling styles? Probably.
> >>> 
> >>> 
> >>> While the first example could be categorized as describing a speech act and the second example might be considered instantiation there’s also the case of subclassing. For example we might want to describe that Alice happily plays guitar:
> >>> 
> >>> #5    <<:Alice :plays :Guitar>> :mood :Happy .
> >>> 
> >>> The other day however she plays guitar because she's sad:
> >>> 
> >>> #6    <<:Alice :plays :Guitar>> :mood :Gloomy .
> >>> 
> >>> "So which one is it?" the unexpecting data consumer might complain. It turns out that indeed we should have chosen the more involved style right away. 
> >>> And that is precisely my concern: the succinct modelling style as in #1, #3, #5 and #6 only works if we can be _sure_ that we are dealing with triples as types - not occurrences, not instances, not subtypes, not whatever other (not so) special cases there might exist. 
> >>> 
> >>> The succinct triple-as-type style only works for use cases that the proposed semantics was optimized for, when working on the very low levels of RDF machinery. In any other case the succinct style can be used first but might need to be changed later, and it requires queries to account for both modelling styles. Both prospects are bad enough to warrant a general rule that says: don’t use the succinct style, use the indirection via creating a statement identifier if you are not really sure that your use case is Explainable AI, versioning or similiarily close to the metal.
> >>> 
> >>> 
> >>> In my understanding the problem stems from the very core of RDF-star’s design: RDF-star quoted triples are verbose in that they quote in full what they identify. That leads to moral hazard: it’s all too easy to take the shortest path and use the type as an identifier where one should mint a proper identifier first. The proposed semantics take advantage of that verbosity and put it to good use of it for those special use cases that require a carbon copy of their subject. But it is not well suited for annotations that influene the meaning of the annotated triple. Maybe it helps to think about the problem this way: property graph style modelling allows to keep the simple triple and yet enrich it with additional detail. But one must admit that the simple triple annotated in two different ways is then not the same triple anymore. 
> >>> 
> >>> 
> >>> I was all along (summer of 2020 IIRC) arguing for proper statement identifiers like RDF/XML provides them and I still think they are the right solution for mainstream use cases as they are much closer to the reality of RDF data and therefore better positioned to capture deviations from the abstract RDF core. Maybe there is a middle ground in the shortcut syntax which could be defined as expanding to identifiers by default - e.g.:
> >>> 
> >>>  :Alice :plays :Guitar {| :mood :Happy |}
> >>>  :Alice :plays :Guitar {| :mood :Moody |}
> >>> 
> >>> expanding to
> >>> 
> >>>  :Alice :plays :Guitar .
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Happy.
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Moody .
> >>> 
> >>> This is guaranteed to be correct for single _and_ multiple occurrences alike, it is easy to author per the shorthand syntax and it is unambiguous to query.
> >>> All more involved use cases - explainable AI, unasserted assertions etc - work as before, as intended, using the quoted triple syntax.
> >>> I’d very much favor that default expansion to use a transparency enabling version of :occurrenceOf in which case the shorthand syntax would really be the syntactic sugar for RDF stanard reification that RDF-star was - and, I guess, outside these specialist circles still is - expected to be. That wouldn’t hurt the specialist use cases in any way.
> >>> 
> >>> 
> >>> Best,
> >>> Thomas
> >>> 
> >>> 
> >>> P.S. w.r.t. "a can of worms": Knowledge representation is indeed a can of worms, and always has been, at least since the old greeks. Statement annotation in RDF is a topic well known to be situated right in the heart of the worm hole. There’s not simple genius way around that.
> >> 
> 

Received on Monday, 20 December 2021 16:08:34 UTC