Re: multisets everywhere

Thank you all for the responses.

Cheers

Em segunda-feira, 20 de dezembro de 2021, Pierre-Antoine Champin <
pierre-antoine.champin@ercim.eu> escreveu:

>
> On 20/12/2021 11:48, thomas lörtsch wrote:
>
>> Hi Laufer,
>>
>> singleton properties and RDF-star are both approachs to statement
>> annotation in RDF. There are more approaches, like RDF standard
>> reification, named graphs etc.
>>
>
> What Thomas said.
>
> Note also that a comparison of those approaches is given in the Lotico
> presentation on RDF-star that Olaf and I gave in March:
>
> https://www.youtube.com/watch?v=ZNfq12mdnsM&t=445s
>
>   best
>
>   I you want to discuss the topic of how they (or some of them) compare I
>> suggest you open a new thread with that topic.
>>
>> Best,
>> Thomas
>>
>>
>> Am 20.12.2021 um 04:17 schrieb Laufer <carlos.laufer@gmail.com>:
>>>
>>> Hello, All,
>>>
>>> I was wondering how this discussion is related to the Singleton Property
>>> proposal [1].
>>>
>>> Cheers,
>>> Laufer
>>>
>>> [1] - Vinh Nguyen, Olivier Bodenreider, and Amit Sheth,; "Don't Like
>>> Reification? Making Statements about Statements Using Singleton Property";
>>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/
>>>
>>> Em domingo, 19 de dezembro de 2021, thomas lörtsch <tl@rat.io> escreveu:
>>> tl;dr
>>> RDF semantics is based on sets and RDF-star builds on that. However
>>> RDF-star triple annotation has to deal with the practice of RDF, not its
>>> theoretical ideal. In RDF as practically employed multisets, although not
>>> the norm, can appear almost everywhere. A design that ignores them per
>>> default but requires rewriting data and queries when they appear will not
>>> fare well in practice. The problem is inherent in the verbosity of the
>>> quoted triple identifier: it favors a syntax that is in almost all cases at
>>> least risky, if not outright wrong. The shortcut syntax might provide a way
>>> out of this dilemma.
>>>
>>>
>>> The following examples should illustrate that multisets have to be
>>> expected almost everywhere in RDF data. From now on I’m always assuming the
>>> standard use case where an actual assertion is annotated:
>>>
>>> #0    :Bob :bought :Car .
>>>       :RichardB :marriedTo :LizT .
>>>       :Alice :plays :Guitar .
>>>
>>>
>>> The CG report says that 'Alice said that Bob bought a car' should be
>>> modeled not as
>>>
>>> #1    <<:Bob :bought :Car>> :said :Alice .
>>>
>>> but as
>>>
>>> #2    [] :occurrenceOf <<:bob :bought :Car>> ;
>>>          :said :Alice ;
>>>
>>> because there might be other sources for the same statement. That’s
>>> always possible so it seems reasonable to always require the indirection of
>>> creating a proper occurrence identifier when annotating a statement with
>>> provenance.
>>>
>>>
>>> Likewise it was recently discussed that marriages between Richard Burton
>>> and Elizabeth Taylor should not be modeled as
>>>
>>> #3    <<:RichardB :marriedTo :LizT>> :start 1966 .
>>>
>>> but rather as
>>>
>>> #4    [] :occurrenceOf <<:RichardB :marriedTo :LizT>> ;
>>>          :start 1966 .
>>>
>>> beacuse we know of that second marriage.
>>>
>>> But what if we didn’t? What if we had authored this in 1967, assuming
>>> that this marriage will last forever? Would we have chosen the more
>>> involved modelling style nonetheless? And if we did go with the succinct #3
>>> version - very probably, at least according to current thinking I assume -
>>> will we later, after their second marriage, have to change that to #4 style?
>>>
>>> What about querying? Say we are not sure if some statement occurs only
>>> once or multiple times: will we have to query for both modelling styles?
>>> Probably.
>>>
>>>
>>> While the first example could be categorized as describing a speech act
>>> and the second example might be considered instantiation there’s also the
>>> case of subclassing. For example we might want to describe that Alice
>>> happily plays guitar:
>>>
>>> #5    <<:Alice :plays :Guitar>> :mood :Happy .
>>>
>>> The other day however she plays guitar because she's sad:
>>>
>>> #6    <<:Alice :plays :Guitar>> :mood :Gloomy .
>>>
>>> "So which one is it?" the unexpecting data consumer might complain. It
>>> turns out that indeed we should have chosen the more involved style right
>>> away.
>>> And that is precisely my concern: the succinct modelling style as in #1,
>>> #3, #5 and #6 only works if we can be _sure_ that we are dealing with
>>> triples as types - not occurrences, not instances, not subtypes, not
>>> whatever other (not so) special cases there might exist.
>>>
>>> The succinct triple-as-type style only works for use cases that the
>>> proposed semantics was optimized for, when working on the very low levels
>>> of RDF machinery. In any other case the succinct style can be used first
>>> but might need to be changed later, and it requires queries to account for
>>> both modelling styles. Both prospects are bad enough to warrant a general
>>> rule that says: don’t use the succinct style, use the indirection via
>>> creating a statement identifier if you are not really sure that your use
>>> case is Explainable AI, versioning or similiarily close to the metal.
>>>
>>>
>>> In my understanding the problem stems from the very core of RDF-star’s
>>> design: RDF-star quoted triples are verbose in that they quote in full what
>>> they identify. That leads to moral hazard: it’s all too easy to take the
>>> shortest path and use the type as an identifier where one should mint a
>>> proper identifier first. The proposed semantics take advantage of that
>>> verbosity and put it to good use of it for those special use cases that
>>> require a carbon copy of their subject. But it is not well suited for
>>> annotations that influene the meaning of the annotated triple. Maybe it
>>> helps to think about the problem this way: property graph style modelling
>>> allows to keep the simple triple and yet enrich it with additional detail.
>>> But one must admit that the simple triple annotated in two different ways
>>> is then not the same triple anymore.
>>>
>>>
>>> I was all along (summer of 2020 IIRC) arguing for proper statement
>>> identifiers like RDF/XML provides them and I still think they are the right
>>> solution for mainstream use cases as they are much closer to the reality of
>>> RDF data and therefore better positioned to capture deviations from the
>>> abstract RDF core. Maybe there is a middle ground in the shortcut syntax
>>> which could be defined as expanding to identifiers by default - e.g.:
>>>
>>>     :Alice :plays :Guitar {| :mood :Happy |}
>>>     :Alice :plays :Guitar {| :mood :Moody |}
>>>
>>> expanding to
>>>
>>>     :Alice :plays :Guitar .
>>>     [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>        :mood :Happy.
>>>     [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>>        :mood :Moody .
>>>
>>> This is guaranteed to be correct for single _and_ multiple occurrences
>>> alike, it is easy to author per the shorthand syntax and it is unambiguous
>>> to query.
>>> All more involved use cases - explainable AI, unasserted assertions etc
>>> - work as before, as intended, using the quoted triple syntax.
>>> I’d very much favor that default expansion to use a transparency
>>> enabling version of :occurrenceOf in which case the shorthand syntax would
>>> really be the syntactic sugar for RDF stanard reification that RDF-star was
>>> - and, I guess, outside these specialist circles still is - expected to be.
>>> That wouldn’t hurt the specialist use cases in any way.
>>>
>>>
>>> Best,
>>> Thomas
>>>
>>>
>>> P.S. w.r.t. "a can of worms": Knowledge representation is indeed a can
>>> of worms, and always has been, at least since the old greeks. Statement
>>> annotation in RDF is a topic well known to be situated right in the heart
>>> of the worm hole. There’s not simple genius way around that.
>>>
>>>
>>> --
>>>
>>> 劳费尔
>>> . . . .. . .
>>> . . . ..
>>> . .. .
>>>
>>>
>>

-- 

劳费尔
. . . .. . .
. . . ..
. .. .

Received on Monday, 20 December 2021 16:48:07 UTC