Re: multisets everywhere from Miel Vander Sande on 2021-12-20 (public-rdf-star@w3.org from December 2021)

From: Miel Vander Sande <miel.vandersande@meemoo.be>
Date: Mon, 20 Dec 2021 15:48:43 +0100
To: Doerthe Arndt <doerthe.arndt@tu-dresden.de>
Cc: thomas lörtsch <tl@rat.io>, "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <CAHeRLWsWFEC3V_v8Uk-HKhTmL8HxAB4D-qC0GG9nHN8jtS8MAA@mail.gmail.com>
Hi Thomas, all,

I do agree that some extra usability on this aspect would definitely not
hurt. It would have quite some gains in practice, just like RDF-star does
over reification. Having this syntactical shorthand as a middle ground has
popped up in my head a couple of times, but I hesitated to ask about it
because:
- it seems very likely that this idea has come up before in the CG.
Probably I missed it
- this only affects the syntaxes, not the RDF-star semantics?
- it would open up a can of blank nodes, unless you have the ability to
also add an identifier
- I would not overload the annotation syntax; that has it's own reasons to
exist (and it raises questions about assertion ;))
- in the end, it's how you can query it that matters most. Can you make
such shorthand work for SPARQL-star?
- technically, this is a syntax enhancement that can be defined in a
separate specification that extends Turtle-star a. o., but probably you
want to stay away from Turtle-star-star

Best,

Miel

Op ma 20 dec. 2021 om 15:20 schreef Doerthe Arndt <
doerthe.arndt@tu-dresden.de>:

> Dear Thomas,
>
> > Am 20.12.2021 um 14:32 schrieb thomas lörtsch <tl@rat.io>:
> >
> >
> >
> > Am 20. Dezember 2021 11:47:48 MEZ schrieb Doerthe Arndt <
> doerthe.arndt@tu-dresden.de>:
> >> Dear Thomas,
> >>
> >> Before going into full discussion mode again :), I would like to fully
> understand your proposal, so please allow me one question:
> >>
> >> Why do you go for
> >>
> >>>  :Alice :plays :Guitar .
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Happy.
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Moody .
> >>
> >> instead of
> >>
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Happy.
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Moody .
> >>
> >>
> >> with your short cut?
> >> I am asking because especially with the marriedTo example looks to me
> like a case where the statement changes its truth value over time (i.e. the
> triple becomes false if the marriage ends, or could at least become false
> depending on what „:marriedTo“ means).
> >>
> >> Maybe I simply missed that point in your previous explanations, so is
> there a short answer why you personally would model it that way?
> >
> > It is my understanding of the (informal) property :occurrenceOf that it
> doesn't assert that statement, just points to it. Isn't that the assumption
> everybody is working under?
> >
>
> Yes, it is. My question was more on why you want to assert the triple you
> are talking about even in cases where you know that it is not true t the
> time you state it. But I guess the answer to that is that you would like to
> be close to property graphs and there, all triples you  refer to, are also
> asserted. So I got my answer (if I understood correctly). Of course, I
> disagree that this is a good way to model your examples ;) but I think that
> has already been discussed in depth on this list.
>
> Kind regards,
> Dörthe
>
> > Best,
> > Thomas
> >
> >> Kind regards,
> >> Dörthe
> >>
> >>
> >>
> >>> Am 20.12.2021 um 01:31 schrieb thomas lörtsch <tl@rat.io>:
> >>>
> >>> tl;dr
> >>> RDF semantics is based on sets and RDF-star builds on that. However
> RDF-star triple annotation has to deal with the practice of RDF, not its
> theoretical ideal. In RDF as practically employed multisets, although not
> the norm, can appear almost everywhere. A design that ignores them per
> default but requires rewriting data and queries when they appear will not
> fare well in practice. The problem is inherent in the verbosity of the
> quoted triple identifier: it favors a syntax that is in almost all cases at
> least risky, if not outright wrong. The shortcut syntax might provide a way
> out of this dilemma.
> >>>
> >>>
> >>> The following examples should illustrate that multisets have to be
> expected almost everywhere in RDF data. From now on I’m always assuming the
> standard use case where an actual assertion is annotated:
> >>>
> >>> #0    :Bob :bought :Car .
> >>>    :RichardB :marriedTo :LizT .
> >>>    :Alice :plays :Guitar .
> >>>
> >>>
> >>> The CG report says that 'Alice said that Bob bought a car' should be
> modeled not as
> >>>
> >>> #1    <<:Bob :bought :Car>> :said :Alice .
> >>>
> >>> but as
> >>>
> >>> #2    [] :occurrenceOf <<:bob :bought :Car>> ;
> >>>       :said :Alice ;
> >>>
> >>> because there might be other sources for the same statement. That’s
> always possible so it seems reasonable to always require the indirection of
> creating a proper occurrence identifier when annotating a statement with
> provenance.
> >>>
> >>>
> >>> Likewise it was recently discussed that marriages between Richard
> Burton and Elizabeth Taylor should not be modeled as
> >>>
> >>> #3    <<:RichardB :marriedTo :LizT>> :start 1966 .
> >>>
> >>> but rather as
> >>>
> >>> #4    [] :occurrenceOf <<:RichardB :marriedTo :LizT>> ;
> >>>       :start 1966 .
> >>>
> >>> beacuse we know of that second marriage.
> >>>
> >>> But what if we didn’t? What if we had authored this in 1967, assuming
> that this marriage will last forever? Would we have chosen the more
> involved modelling style nonetheless? And if we did go with the succinct #3
> version - very probably, at least according to current thinking I assume -
> will we later, after their second marriage, have to change that to #4
> style?
> >>>
> >>> What about querying? Say we are not sure if some statement occurs only
> once or multiple times: will we have to query for both modelling styles?
> Probably.
> >>>
> >>>
> >>> While the first example could be categorized as describing a speech
> act and the second example might be considered instantiation there’s also
> the case of subclassing. For example we might want to describe that Alice
> happily plays guitar:
> >>>
> >>> #5    <<:Alice :plays :Guitar>> :mood :Happy .
> >>>
> >>> The other day however she plays guitar because she's sad:
> >>>
> >>> #6    <<:Alice :plays :Guitar>> :mood :Gloomy .
> >>>
> >>> "So which one is it?" the unexpecting data consumer might complain. It
> turns out that indeed we should have chosen the more involved style right
> away.
> >>> And that is precisely my concern: the succinct modelling style as in
> #1, #3, #5 and #6 only works if we can be _sure_ that we are dealing with
> triples as types - not occurrences, not instances, not subtypes, not
> whatever other (not so) special cases there might exist.
> >>>
> >>> The succinct triple-as-type style only works for use cases that the
> proposed semantics was optimized for, when working on the very low levels
> of RDF machinery. In any other case the succinct style can be used first
> but might need to be changed later, and it requires queries to account for
> both modelling styles. Both prospects are bad enough to warrant a general
> rule that says: don’t use the succinct style, use the indirection via
> creating a statement identifier if you are not really sure that your use
> case is Explainable AI, versioning or similiarily close to the metal.
> >>>
> >>>
> >>> In my understanding the problem stems from the very core of RDF-star’s
> design: RDF-star quoted triples are verbose in that they quote in full what
> they identify. That leads to moral hazard: it’s all too easy to take the
> shortest path and use the type as an identifier where one should mint a
> proper identifier first. The proposed semantics take advantage of that
> verbosity and put it to good use of it for those special use cases that
> require a carbon copy of their subject. But it is not well suited for
> annotations that influene the meaning of the annotated triple. Maybe it
> helps to think about the problem this way: property graph style modelling
> allows to keep the simple triple and yet enrich it with additional detail.
> But one must admit that the simple triple annotated in two different ways
> is then not the same triple anymore.
> >>>
> >>>
> >>> I was all along (summer of 2020 IIRC) arguing for proper statement
> identifiers like RDF/XML provides them and I still think they are the right
> solution for mainstream use cases as they are much closer to the reality of
> RDF data and therefore better positioned to capture deviations from the
> abstract RDF core. Maybe there is a middle ground in the shortcut syntax
> which could be defined as expanding to identifiers by default - e.g.:
> >>>
> >>>  :Alice :plays :Guitar {| :mood :Happy |}
> >>>  :Alice :plays :Guitar {| :mood :Moody |}
> >>>
> >>> expanding to
> >>>
> >>>  :Alice :plays :Guitar .
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Happy.
> >>>  [] :occurrenceOf <<:Alice :plays :Guitar>> ;
> >>>     :mood :Moody .
> >>>
> >>> This is guaranteed to be correct for single _and_ multiple occurrences
> alike, it is easy to author per the shorthand syntax and it is unambiguous
> to query.
> >>> All more involved use cases - explainable AI, unasserted assertions
> etc - work as before, as intended, using the quoted triple syntax.
> >>> I’d very much favor that default expansion to use a transparency
> enabling version of :occurrenceOf in which case the shorthand syntax would
> really be the syntactic sugar for RDF stanard reification that RDF-star was
> - and, I guess, outside these specialist circles still is - expected to be.
> That wouldn’t hurt the specialist use cases in any way.
> >>>
> >>>
> >>> Best,
> >>> Thomas
> >>>
> >>>
> >>> P.S. w.r.t. "a can of worms": Knowledge representation is indeed a can
> of worms, and always has been, at least since the old greeks. Statement
> annotation in RDF is a topic well known to be situated right in the heart
> of the worm hole. There’s not simple genius way around that.
> >>
>
>
Received on Monday, 20 December 2021 14:49:24 UTC