Re: a modest proposal - eliminate reifiers completely from Kurt Cagle on 2024-04-12 (public-rdf-star-wg@w3.org from April 2024)

From: Kurt Cagle <kurt.cagle@gmail.com>
Date: Fri, 12 Apr 2024 12:42:36 -0700
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: Niklas Lindström <lindstream@gmail.com>, public-rdf-star-wg@w3.org
Message-ID: <CALm0LSEh2Suzhp5YLm_tO7ejx=+r=kTChgUN-A3UbiPqDDfk6w@mail.gmail.com>
Sorry for the odd blank space at the bottom of the email. System fart.
*Kurt Cagle*
Editor in Chief
The Cagle Report
kurt.cagle@gmail.com
443-837-8725 <http://voice.google.com/calls?a=nc,%2B14438378725>


On Fri, Apr 12, 2024 at 12:40 PM Kurt Cagle <kurt.cagle@gmail.com> wrote:

> > It seems that the WG is at an impasse.
>
> > How about reverting to an old situation where there are no reifiers at
> all,
> > just quoted triples, and require users to stand off from the triple as
> required?
>
> I was at an IA conference yesterday, and the question of reification was
> raised in several different contexts. I think it's important to remember
> that reification is significant primarily because it is accommodating
> (syntactically) parity with a neo4j construct.
>
> That is to say:
>
> :s :p :o .
> :s  a  rdf:type .
> << :s :p :o >> :p1 :o1; :p2 :o2 .
>
> is the equivalent of a neo4j assertion with two properties on its "edge".
>
>
> What I see here is that we're also attempting to create an assignment
> statement with reifiers in Turtle:
>
> <<(:r | :s :p :o )>>
>
> when this is an operation that is normally done in SPARQL:
>
> bind (<<:s :p :o>> as ?r)
>
> What we're arguing about then, to me, is a deeper question: should we have
> assignment statements in Turtle?
>
> My gut feeling is no, for all the reasons that have become evident:
>
>    - Inconsistency in IRI assignment for a given resource from multiple
>    sources
>    - The need to police the edge cases to negotiate cardinality
>    - it encourages poor modeling practices, as a reification is often a
>    shortcut for modeling entities that should be formally defined.
>    - it requires a significant set of additional semantics (predicate
>    additions) into rdf itself.
>
> Taking Peter's example:
>
> Liz :married-to :Dick .
> :Liz :married-on "1964-03-15"^^xsd:date.
> :Liz :married-to :Eddie .
> :Liz :married-on "1959-05-12"^^xsd:date.
>
> This should be modelled as :
> :m1 a :Marriage ;
>      :firstSpouse :Liz ;
>      :secondSpouse :Dick .
>      :startDate "1964-03-15" ;
>      :endDate "1959-02-17" .  # Or some appropriate end-date prior to the
> second marriage
> :m2 a :Marriage .
>       :firstSpouse :Liz ;
>       :secondSpouse :Eddie  ;
>       :startDate "1959-05-12" .
>
> We're trying to turn <<:Liz :married-to :Dick>> into a semantic carrying
> vehicle, when we're better off declaring a formal structure.
>
> Relate this back to the Neo4J modelling. When I say:
>
> Liz -- marriedTo --> Dick [startDate "1964-03-15",endDate "1959-02-17"] .
> Liz -- marriedTo --> Eddie [startDate "1959-05-12"] .
>
> What we're actually doing is hiding a lot of implicit semantics:
> Liz -- marriedTo --> Dick
> is actually multiple assertions:
> There exists an implicit marriage M1
> M1 is a marriage entity between Liz and Dick.
> It is the (implicit) marriage M1 that is being annotated, not Liz, even if
> that is what it appears to be on the surface.
> There is a directional implication (which is why I have :firstSpouse,
> :secondSpouse in the RDF example, even though the non-directed :spouse
> would be more appropriate).
>
> You can argue that the RDF is uglier and more verbose, but that's because
> it is also more precise. There are a lot of unstated assumptions made in
> neo4J which is one reason that models created that way usually get very
> ugly conceptually; the hidden semantics come back to bite you.
>
> RDF to a certain extent does this with blank nodes. Syntactically, the
> above statements could be rendered:
> [ a :Marriage ;
>      :firstSpouse :Liz ;
>      :secondSpouse :Dick .
>      :startDate "1964-03-15" ;
>      :endDate "1959-02-17" ;
>      ].
> [ a :Marriage .
>       :firstSpouse :Liz ;
>       :secondSpouse :Eddie  ;
>       :startDate "1959-05-12";
>       ]
>
> but in most cases we forget the type association.
>
> Contrast this:
> [ :firstSpouse :Liz ; :secondSpouse :Dick ; :startDate  "1964-03-15" ;
> :endDate "1959-02-17" ; a :Marriage]
>
> with the Neo4J-esque
> Liz -- marriedTo --> Dick [startDate "1964-03-15",endDate "1959-02-17"] .
>
> It is a little more verbose, but that's only because Neo4J is actually
> treating what should be an object (:Liz, via the :firstSpouse predicate) as
> a subject with no explicit semantics. It is using the property marriedTo as
> a carrier of type or class, without formally making that assertion anywhere.
>
> Put another way, Neo4J works because it makes naive assumptions, and it
> gets into trouble because those naive assumptions don't survive complex
> modeling.
>
> This is why we have to be careful about reification, because it does hide
> those semantics.
>
> Put another way: << :Liz :married :Dick >>  :startDate "1964" ; etc.  is
> explicitly:
>
> [ a rdfs:Class; rdf:subject :Liz; rdf:object :Dick; rdf:predicate
> :married] :startDate "1964" ; etc.
>
> with the big assumption that rdf predicate :married can be used to derive
> the fact that the class involved is a marriage. It is low value semantics
> and a potentially dangerous shortcut for doing proper modelling, but it
> gets you to Neo4J equivalence.
>
> I believe we are arguing at this point because we're looking at Neo4J
> without recognizing that their semantics are ill-defined and somewhat
> deceptive, and we're trying to satisfy ease of use at the expense of that
> precision.
>
> So what about the general annotation space where I have two individuals
> trying to create annotations on a given statement? This, to me, is THE use
> case for reification.
>
> <<:m1 :startDate "1964">> rdf:annotate [
> a :Annotation ;
>      Annotation:source "https://www.example.com/LizMarriageArticle.html";
>      Annotation:fromDate "2024-01-03" ;
>      Annotation:by janeDoe@gmail.com ;
> ], [...].
>
> In this case we ARE annotating an RDF statement (there is probably a
> different term used here than rdf:annotate, but the idea should be the
> same).
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> If you want
>
>
>
>
>
>
>
>
>
>
> *Kurt Cagle*
> Editor in Chief
> The Cagle Report
> kurt.cagle@gmail.com
> 443-837-8725 <http://voice.google.com/calls?a=nc,%2B14438378725>
>
>
> On Fri, Apr 12, 2024 at 7:44 AM Peter F. Patel-Schneider <
> pfpschneider@gmail.com> wrote:
>
>> Yes, but there appears to be an irreconcilable difference here.
>>
>>
>>
>> The situation with quoted triples is actually no different from any other
>> case
>> where some pieces of information about a resource need to be kept
>> together.
>> For example:
>>
>> :Liz :married-to :Dick .
>> :Liz :married-on "1964-03-15"^^xsd:date.
>> :Liz :married-to :Eddie .
>> :Liz :married-on "1959-05-12"^^xsd:date.
>>
>> suffers from exactly the same problem as
>>
>> << :Liz :spouse :Dick >> :ceremony-location :Montreal.
>> << :Liz :spouse :Dick >> :ceremony-date "1964-03-15"^^xsd:date.
>> << :Liz :spouse :Dick >> :ceremony-location :Chobe.
>> << :Liz :spouse :Dick >> :ceremony-date "1975-10-10"^^xsd:date .
>>
>> In both cases there need to be extra resources added for accurate
>> modelling.
>>
>> peter
>>
>>
>> On 4/12/24 10:00, Niklas Lindström wrote:
>> > On Fri, Apr 12, 2024 at 2:57 PM Peter F. Patel-Schneider
>> > <pfpschneider@gmail.com> wrote:
>> >>
>> >> It seems that the WG is at an impasse.
>> >
>> > I think we're "just" not in agreement about whether the cardinality of
>> > rdf:reifies should conceptually be one or many. Some claim it makes
>> > sense, others claim that it deviates from the notion of a reified
>> > statement, taken as a "direct relationship instance" (which I presume
>> > is what an LPG edge is taken to "denote" in the OneGraph
>> > harmonization).
>> >
>> > It is an important question, since the motivation is to not add
>> > something which is then unnecessarily (or by default) used in
>> > nonsensical ways, or opens up for accidental complexity. This avoids
>> > necessary remodeling if new details crop up, and/or B) integration
>> > with data from other sources.
>> >
>> >> How about reverting to an old situation where there are no reifiers at
>> all,
>> >> just quoted triples, and require users to stand off from the triple as
>> required?
>> >
>> > That depends on whether or not the syntaxes allow them (or worse,
>> > encourage them) to be used as subjects (opening up for the seminal
>> > error). We came to the proposal of only using them with reifiers since
>> > that's when they work with use cases as-is. I.e., we have agreed that
>> > this (talking about bare triple terms) is not what use cases call for
>> > (not the least of which are the Amazon Neptune use cases with multiple
>> > edges [1]), and makes no sense if used as is in all but the most
>> > model-theoretical domains of discourse (including for token
>> > provenance; the most obvious kind of occurrences-not-types).
>> >
>> > /Niklas
>> >
>> > [1]:
>> https://lists.w3.org/Archives/Public/public-rdf-star/2021Dec/att-0001/rdf-star-neptune-use-cases-20211202.pdf
>> >
>> >
>> >> peter
>> >>
>> >>
>> >>
>>
>>
Received on Friday, 12 April 2024 19:43:14 UTC