Re: multisets everywhere from Laufer on 2021-12-20 (public-rdf-star@w3.org from December 2021)

From: Laufer <carlos.laufer@gmail.com>
Date: Mon, 20 Dec 2021 00:17:36 -0300
To: thomas lörtsch <tl@rat.io>
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <CAFg8H7j5D65-qMyG1-pJCCxfdGZg3KijE0fiitvahUS0j08-EA@mail.gmail.com>
Hello, All,

I was wondering how this discussion is related to the Singleton Property
proposal [1].

Cheers,
Laufer

[1] - Vinh Nguyen, Olivier Bodenreider, and Amit Sheth,; "Don't Like
Reification? Making Statements about Statements Using Singleton Property";
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/

Em domingo, 19 de dezembro de 2021, thomas lörtsch <tl@rat.io> escreveu:

> tl;dr
> RDF semantics is based on sets and RDF-star builds on that. However
> RDF-star triple annotation has to deal with the practice of RDF, not its
> theoretical ideal. In RDF as practically employed multisets, although not
> the norm, can appear almost everywhere. A design that ignores them per
> default but requires rewriting data and queries when they appear will not
> fare well in practice. The problem is inherent in the verbosity of the
> quoted triple identifier: it favors a syntax that is in almost all cases at
> least risky, if not outright wrong. The shortcut syntax might provide a way
> out of this dilemma.
>
>
> The following examples should illustrate that multisets have to be
> expected almost everywhere in RDF data. From now on I’m always assuming the
> standard use case where an actual assertion is annotated:
>
> #0    :Bob :bought :Car .
>      :RichardB :marriedTo :LizT .
>      :Alice :plays :Guitar .
>
>
> The CG report says that 'Alice said that Bob bought a car' should be
> modeled not as
>
> #1    <<:Bob :bought :Car>> :said :Alice .
>
> but as
>
> #2    [] :occurrenceOf <<:bob :bought :Car>> ;
>         :said :Alice ;
>
> because there might be other sources for the same statement. That’s always
> possible so it seems reasonable to always require the indirection of
> creating a proper occurrence identifier when annotating a statement with
> provenance.
>
>
> Likewise it was recently discussed that marriages between Richard Burton
> and Elizabeth Taylor should not be modeled as
>
> #3    <<:RichardB :marriedTo :LizT>> :start 1966 .
>
> but rather as
>
> #4    [] :occurrenceOf <<:RichardB :marriedTo :LizT>> ;
>         :start 1966 .
>
> beacuse we know of that second marriage.
>
> But what if we didn’t? What if we had authored this in 1967, assuming that
> this marriage will last forever? Would we have chosen the more involved
> modelling style nonetheless? And if we did go with the succinct #3 version
> - very probably, at least according to current thinking I assume - will we
> later, after their second marriage, have to change that to #4 style?
>
> What about querying? Say we are not sure if some statement occurs only
> once or multiple times: will we have to query for both modelling styles?
> Probably.
>
>
> While the first example could be categorized as describing a speech act
> and the second example might be considered instantiation there’s also the
> case of subclassing. For example we might want to describe that Alice
> happily plays guitar:
>
> #5    <<:Alice :plays :Guitar>> :mood :Happy .
>
> The other day however she plays guitar because she's sad:
>
> #6    <<:Alice :plays :Guitar>> :mood :Gloomy .
>
> "So which one is it?" the unexpecting data consumer might complain. It
> turns out that indeed we should have chosen the more involved style right
> away.
> And that is precisely my concern: the succinct modelling style as in #1,
> #3, #5 and #6 only works if we can be _sure_ that we are dealing with
> triples as types - not occurrences, not instances, not subtypes, not
> whatever other (not so) special cases there might exist.
>
> The succinct triple-as-type style only works for use cases that the
> proposed semantics was optimized for, when working on the very low levels
> of RDF machinery. In any other case the succinct style can be used first
> but might need to be changed later, and it requires queries to account for
> both modelling styles. Both prospects are bad enough to warrant a general
> rule that says: don’t use the succinct style, use the indirection via
> creating a statement identifier if you are not really sure that your use
> case is Explainable AI, versioning or similiarily close to the metal.
>
>
> In my understanding the problem stems from the very core of RDF-star’s
> design: RDF-star quoted triples are verbose in that they quote in full what
> they identify. That leads to moral hazard: it’s all too easy to take the
> shortest path and use the type as an identifier where one should mint a
> proper identifier first. The proposed semantics take advantage of that
> verbosity and put it to good use of it for those special use cases that
> require a carbon copy of their subject. But it is not well suited for
> annotations that influene the meaning of the annotated triple. Maybe it
> helps to think about the problem this way: property graph style modelling
> allows to keep the simple triple and yet enrich it with additional detail.
> But one must admit that the simple triple annotated in two different ways
> is then not the same triple anymore.
>
>
> I was all along (summer of 2020 IIRC) arguing for proper statement
> identifiers like RDF/XML provides them and I still think they are the right
> solution for mainstream use cases as they are much closer to the reality of
> RDF data and therefore better positioned to capture deviations from the
> abstract RDF core. Maybe there is a middle ground in the shortcut syntax
> which could be defined as expanding to identifiers by default - e.g.:
>
>    :Alice :plays :Guitar {| :mood :Happy |}
>    :Alice :plays :Guitar {| :mood :Moody |}
>
> expanding to
>
>    :Alice :plays :Guitar .
>    [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>       :mood :Happy.
>    [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>       :mood :Moody .
>
> This is guaranteed to be correct for single _and_ multiple occurrences
> alike, it is easy to author per the shorthand syntax and it is unambiguous
> to query.
> All more involved use cases - explainable AI, unasserted assertions etc -
> work as before, as intended, using the quoted triple syntax.
> I’d very much favor that default expansion to use a transparency enabling
> version of :occurrenceOf in which case the shorthand syntax would really be
> the syntactic sugar for RDF stanard reification that RDF-star was - and, I
> guess, outside these specialist circles still is - expected to be. That
> wouldn’t hurt the specialist use cases in any way.
>
>
> Best,
> Thomas
>
>
> P.S. w.r.t. "a can of worms": Knowledge representation is indeed a can of
> worms, and always has been, at least since the old greeks. Statement
> annotation in RDF is a topic well known to be situated right in the heart
> of the worm hole. There’s not simple genius way around that.
>


-- 

劳费尔
. . . .. . .
. . . ..
. .. .
Received on Monday, 20 December 2021 03:18:51 UTC