Re: Using rdf:asserts (for Truth) and rdf:reifies (for Hypothesis) to be on par with relational and LPG from Niklas Lindström on 2024-08-22 (public-rdf-star-wg@w3.org from August 2024)

From: Niklas Lindström <lindstream@gmail.com>
Date: Thu, 22 Aug 2024 17:12:57 +0200
To: Andy Seaborne <andy@apache.org>
Cc: public-rdf-star-wg@w3.org
Message-ID: <CADjV5jez4BggK0kGQvdGFz95MKFHEPVyfi5ytq7+rMdV+3Zs0w@mail.gmail.com>
On Thu, Aug 22, 2024 at 1:01 PM Andy Seaborne <andy@apache.org> wrote:
>
>
>
> On 22/08/2024 05:16, Souripriya Das wrote:
> > Suppose there are two kinds of statements that we need to store using
> > our data model: Truth and Hypothesis.
>  >
> > Within an id-based scope (i.e., a scope that contains id-s-p-o tuples),
> > I'd like to specify an s-p-o either as a Truth or as a Hypothesis. If we
> > use a single RDF-blessed property, rdf:reifies, we can do this as follows:
> >      :id rdf:reifies <<( :s :p :o )>> ; a :Truth .
> >      OR
> >      :id rdf:reifies <<( :s :p :o )>> ; a :Hypothesis .
> >
> > What's the complexity? An extra triple has to be added to designate an
> > id-s-p-o tuple as a Truth or a Hypothesis. Retrieval of one kind of
> > statements via SPARQL then requires an additional triple-pattern:
> > Example=> { ?id rdf:reifies <<( ?s ?p ?o )>> ; a :Truth } .
> >
> > Doing this in RDF is more complex than what's needed in the case of
> > relational data. There we do not need to add any extra row to the table
> > to indicate this – we just need an extra column, in the existing row, to
> > be populated. SQL query simply needs an extra condition (e.g.,
> > <tableAlias>.kind = 'Truth'), not an extra join.
> >
> > Doing this in RDF is more complex than what's needed in the case of LPG
> > data as well. There, assuming edge is actually stored as a struct or
> > record, all that is needed is adding and populating an extra attribute
> > (i.e., edge-property). The query for the extended LPG data only needs an
> > extra filter (e.g., <edge-var>.kind = "Truth").
> >
> > So, to avoid this Truth/Hypothesis handling complexity related
> > disadvantage compared to relational data and LPG, my suggestion would be
> > to integrate the Truth or Hypothesis indicator into the id-s-p-o tuple
> > itself by providing two distinct RDF-blessed properties, say rdf:asserts
> > (for Truth) and rdf:reifies (for Hypothesis) – all within the id-based
> > scope. (This does not affect s-p-o triples in any way because those are
> > not part of any id-based scope. Those are in a "default" (no id) scope.)
>
> Truth/Hypothesis are not the only cases.
>
> Consider a trivial vocabulary:
>
> ex:added      -- fact added to the graph
> ex:retracted  -- fact retracted
> ex:source     -- URL where a triple occurred.
>
> These are orthogonal aspects [1].

I definitely agree. Modeling these aspects belong to the specific
domain (of discourse).

In RDF, triples in graphs encode simple truths (asserting "that the
relationship holds" [1]). An RDF graph is a simple, monotonic,
atemporal model of a world. If a triple is there, the claim about the
world it encodes [2], is simply true (in the interpretation). No less,
no more.

> There could be a subproperty of rdf:reifies in a vocabulary that implies
> a Truth or Hypothesis class about the reifier making the extra explicit
> type triple unnecessary. Usage patterns could be a profile. A simple
> ingestion pipeline can materialize it if desired.

Certainly. In fact, properties used in the description of the reifier
can be enough to imply what it is.

Example:

    :provenBy rdfs:domain :ScientificTheory ; rdfs:range :Observation .

    <X> :relation <Y> ~ <Hypothesis_A> ~ <Hypothesis_B> .

    <Hypothesis_A> :provenBy :O1 {| :source <Publication_A> |} .

    << <Hypothesis_B> :provenBy :O2 >> :source <Publication_B> .

In this "world", a relationship holds between X and Y. Two hypotheses
are taken (to hold) as reifiers of that claim. One of them,
Hypothesis_A, is taken as being proven by O1, which implies that
Hypothesis_A is a scientific theory and O1 is an observation.
Publication_A is a source of a reified claim of this proof.
Publication_B is a source of another claim that the other,
Hypothesis_B is  proven by O2. But given the graph, this is not known
to be true in its interpretation. (So we don't know if Hypothesis_B is
a scientific theory, nor that O2 is an actual observation.)

> At the moment, there is limited to no experience of using RDF 1.2 for
> relational data or LPG mapping. I would hope a vocabulary and profile
> appear and mature in a longer timescale. Making anything RDF-blessed
> (meaning in the rdf: namespace) seems too early. It can emerge later -
> removing things is harder.

Yes, I strongly agree.

As I see it, what is added with `rdf:reifies` and triple terms is
definitely enough for the use cases we have (and not only the ones
I've submitted as a representative of the National Library of Sweden).
With that, RDF 1.2 would make reification exact and more powerful,
working both for provenance and ("ad hoc") qualification. I've come to
see this as warranting the cost (in "conceptual complexity") of adding
triple terms. At this point I want to be very careful about adding any
further capabilities or nuances.

(I have also noted [3] that ad hoc qualification for simple
("LPG-style") relationships versus up-front "N-ary" (or "E-R")
modelling is not necessarily a "fatal" design decision. And I don't
consider one or the other approach superior in all cases. It very much
depends on domain and application requirements and expectations.)

Best regards,
Niklas

[1]: <https://www.w3.org/TR/rdf12-concepts/#resources-and-statements>
[2]: <https://www.w3.org/TR/rdf12-concepts/#entailment>
[3]: <https://github.com/w3c/rdf-ucr/issues/27>


>      Andy
>
> >
> > Thanks,
> > Souri.
> >
> >
>
> [1] https://www.w3.org/2009/12/rdf-ws/papers/ws12
>
Received on Thursday, 22 August 2024 15:13:28 UTC