Re: multisets everywhere from Anthony Moretti on 2021-12-20 (public-rdf-star@w3.org from December 2021)

From: Anthony Moretti <anthony.moretti@gmail.com>
Date: Mon, 20 Dec 2021 21:09:45 +1030
To: Laufer <carlos.laufer@gmail.com>
Cc: thomas lörtsch <tl@rat.io>, "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <CACusdfSkYXc-VKR3wTuwY6UrOSjdMRyz0M3xLBGTx5xRjPtUUA@mail.gmail.com>
Is it at all possible for us to add some positions to the structure of an
RDF-star statement, all of which are optional? I feel like we can solve
some reoccurring problems (no pun intended).

The following are my suggestions, along with the values that should be
assumed if the position is blank:

 - Start time (default: beginning of time)
 - End time (default: end of time)
 - Location (default: everywhere)
 - Certainty (default: 1.0)

Why don't we make a really nice jump in usability by making these
first-class citizens of RDF-star? They directly, globally, and
unambiguously affect the truth of any statement. Other annotations, to do
with provenance etc, and even surrounding statements, can also be thought
of as affecting a statement's truth but only indirectly. I feel like we're
not separating concerns properly when we treat the annotations listed above
like any other annotations.

In my view triples in their current form are incomplete statements, every
single statement should be scoped in space and time otherwise it's sort of
meaningless. How can the fundamental unit of description we use be so
incomplete? The discussion about reoccurring relationships showcases this,
it's now necessary to use something like schema:Event to model the
following simply because it reoccurs:

    GroverCleveland    isPresidentOf    UnitedStates

In Thomas' email his first and second examples seem to correspond to the
following, where "statementOf" is referentially opaque and "occurrenceOf"
is referentially transparent:

    [] :statementOf <<:bob :bought :Car>> ;
        :said :Alice .

    [] :occurrenceOf <<:RichardB :marriedTo :LizT>> ;
        :start 1966 .

In the simplest cases "occurrenceOf" wouldn't be needed because it would be
covered by start and end times being part of the statement (Thomas' third
example still needs to be modeled using "occurrenceOf" because of the
additional "mood" annotation). Modeling becomes simpler if we don't need to
use something like "occurrenceOf" for simple use cases.

If RDF-star is an answer to property graphs then why don't we improve it
one step further and address basic temporal modeling?

I was wondering how this discussion is related to the Singleton Property
> proposal [1].
>
> Cheers,
> Laufer
>

I looked through the paper and yeah it's a different approach to statement
reification, but I disagree that it's intuitive like they say. Relations
are intuitively reusable constructs, the approach in the paper stops them
being reused and creates a new instance for each use, it fights the nature
of what a relation is. In my opinion it's much more intuitive to give the
entire statement an ID, like what's done in standard statement reification.

Regards
Anthony


On Mon, Dec 20, 2021 at 1:49 PM Laufer <carlos.laufer@gmail.com> wrote:

> Hello, All,
>
> I was wondering how this discussion is related to the Singleton Property
> proposal [1].
>
> Cheers,
> Laufer
>
> [1] - Vinh Nguyen, Olivier Bodenreider, and Amit Sheth,; "Don't Like
> Reification? Making Statements about Statements Using Singleton Property";
> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/
>
> Em domingo, 19 de dezembro de 2021, thomas lörtsch <tl@rat.io> escreveu:
>
>> tl;dr
>> RDF semantics is based on sets and RDF-star builds on that. However
>> RDF-star triple annotation has to deal with the practice of RDF, not its
>> theoretical ideal. In RDF as practically employed multisets, although not
>> the norm, can appear almost everywhere. A design that ignores them per
>> default but requires rewriting data and queries when they appear will not
>> fare well in practice. The problem is inherent in the verbosity of the
>> quoted triple identifier: it favors a syntax that is in almost all cases at
>> least risky, if not outright wrong. The shortcut syntax might provide a way
>> out of this dilemma.
>>
>>
>> The following examples should illustrate that multisets have to be
>> expected almost everywhere in RDF data. From now on I’m always assuming the
>> standard use case where an actual assertion is annotated:
>>
>> #0    :Bob :bought :Car .
>>      :RichardB :marriedTo :LizT .
>>      :Alice :plays :Guitar .
>>
>>
>> The CG report says that 'Alice said that Bob bought a car' should be
>> modeled not as
>>
>> #1    <<:Bob :bought :Car>> :said :Alice .
>>
>> but as
>>
>> #2    [] :occurrenceOf <<:bob :bought :Car>> ;
>>         :said :Alice ;
>>
>> because there might be other sources for the same statement. That’s
>> always possible so it seems reasonable to always require the indirection of
>> creating a proper occurrence identifier when annotating a statement with
>> provenance.
>>
>>
>> Likewise it was recently discussed that marriages between Richard Burton
>> and Elizabeth Taylor should not be modeled as
>>
>> #3    <<:RichardB :marriedTo :LizT>> :start 1966 .
>>
>> but rather as
>>
>> #4    [] :occurrenceOf <<:RichardB :marriedTo :LizT>> ;
>>         :start 1966 .
>>
>> beacuse we know of that second marriage.
>>
>> But what if we didn’t? What if we had authored this in 1967, assuming
>> that this marriage will last forever? Would we have chosen the more
>> involved modelling style nonetheless? And if we did go with the succinct #3
>> version - very probably, at least according to current thinking I assume -
>> will we later, after their second marriage, have to change that to #4
>> style?
>>
>> What about querying? Say we are not sure if some statement occurs only
>> once or multiple times: will we have to query for both modelling styles?
>> Probably.
>>
>>
>> While the first example could be categorized as describing a speech act
>> and the second example might be considered instantiation there’s also the
>> case of subclassing. For example we might want to describe that Alice
>> happily plays guitar:
>>
>> #5    <<:Alice :plays :Guitar>> :mood :Happy .
>>
>> The other day however she plays guitar because she's sad:
>>
>> #6    <<:Alice :plays :Guitar>> :mood :Gloomy .
>>
>> "So which one is it?" the unexpecting data consumer might complain. It
>> turns out that indeed we should have chosen the more involved style right
>> away.
>> And that is precisely my concern: the succinct modelling style as in #1,
>> #3, #5 and #6 only works if we can be _sure_ that we are dealing with
>> triples as types - not occurrences, not instances, not subtypes, not
>> whatever other (not so) special cases there might exist.
>>
>> The succinct triple-as-type style only works for use cases that the
>> proposed semantics was optimized for, when working on the very low levels
>> of RDF machinery. In any other case the succinct style can be used first
>> but might need to be changed later, and it requires queries to account for
>> both modelling styles. Both prospects are bad enough to warrant a general
>> rule that says: don’t use the succinct style, use the indirection via
>> creating a statement identifier if you are not really sure that your use
>> case is Explainable AI, versioning or similiarily close to the metal.
>>
>>
>> In my understanding the problem stems from the very core of RDF-star’s
>> design: RDF-star quoted triples are verbose in that they quote in full what
>> they identify. That leads to moral hazard: it’s all too easy to take the
>> shortest path and use the type as an identifier where one should mint a
>> proper identifier first. The proposed semantics take advantage of that
>> verbosity and put it to good use of it for those special use cases that
>> require a carbon copy of their subject. But it is not well suited for
>> annotations that influene the meaning of the annotated triple. Maybe it
>> helps to think about the problem this way: property graph style modelling
>> allows to keep the simple triple and yet enrich it with additional detail.
>> But one must admit that the simple triple annotated in two different ways
>> is then not the same triple anymore.
>>
>>
>> I was all along (summer of 2020 IIRC) arguing for proper statement
>> identifiers like RDF/XML provides them and I still think they are the right
>> solution for mainstream use cases as they are much closer to the reality of
>> RDF data and therefore better positioned to capture deviations from the
>> abstract RDF core. Maybe there is a middle ground in the shortcut syntax
>> which could be defined as expanding to identifiers by default - e.g.:
>>
>>    :Alice :plays :Guitar {| :mood :Happy |}
>>    :Alice :plays :Guitar {| :mood :Moody |}
>>
>> expanding to
>>
>>    :Alice :plays :Guitar .
>>    [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>       :mood :Happy.
>>    [] :occurrenceOf <<:Alice :plays :Guitar>> ;
>>       :mood :Moody .
>>
>> This is guaranteed to be correct for single _and_ multiple occurrences
>> alike, it is easy to author per the shorthand syntax and it is unambiguous
>> to query.
>> All more involved use cases - explainable AI, unasserted assertions etc -
>> work as before, as intended, using the quoted triple syntax.
>> I’d very much favor that default expansion to use a transparency enabling
>> version of :occurrenceOf in which case the shorthand syntax would really be
>> the syntactic sugar for RDF stanard reification that RDF-star was - and, I
>> guess, outside these specialist circles still is - expected to be. That
>> wouldn’t hurt the specialist use cases in any way.
>>
>>
>> Best,
>> Thomas
>>
>>
>> P.S. w.r.t. "a can of worms": Knowledge representation is indeed a can of
>> worms, and always has been, at least since the old greeks. Statement
>> annotation in RDF is a topic well known to be situated right in the heart
>> of the worm hole. There’s not simple genius way around that.
>>
>
>
> --
>
> 劳费尔
> . . . .. . .
> . . . ..
> . .. .
>
>
Received on Monday, 20 December 2021 10:40:11 UTC