Re: Using rdf:asserts (for Truth) and rdf:reifies (for Hypothesis) to be on par with relational and LPG from Thomas Lörtsch on 2024-08-23 (public-rdf-star-wg@w3.org from August 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Fri, 23 Aug 2024 14:26:10 +0200
To: Niklas Lindström <lindstream@gmail.com>
Cc: Andy Seaborne <andy@apache.org>, public-rdf-star-wg@w3.org
Message-Id: <526FCF5A-917A-4F1A-B678-17EF0BFA555E@rat.io>
> On 22. Aug 2024, at 17:12, Niklas Lindström <lindstream@gmail.com> wrote:
> 
> On Thu, Aug 22, 2024 at 1:01 PM Andy Seaborne <andy@apache.org> wrote:
>> 
>> 
>> 
>> On 22/08/2024 05:16, Souripriya Das wrote:
>>> Suppose there are two kinds of statements that we need to store using
>>> our data model: Truth and Hypothesis.
>>> 
>>> Within an id-based scope (i.e., a scope that contains id-s-p-o tuples),
>>> I'd like to specify an s-p-o either as a Truth or as a Hypothesis. If we
>>> use a single RDF-blessed property, rdf:reifies, we can do this as follows:
>>>     :id rdf:reifies <<( :s :p :o )>> ; a :Truth .
>>>     OR
>>>     :id rdf:reifies <<( :s :p :o )>> ; a :Hypothesis .
>>> 
>>> What's the complexity? An extra triple has to be added to designate an
>>> id-s-p-o tuple as a Truth or a Hypothesis. Retrieval of one kind of
>>> statements via SPARQL then requires an additional triple-pattern:
>>> Example=> { ?id rdf:reifies <<( ?s ?p ?o )>> ; a :Truth } .
>>> 
>>> Doing this in RDF is more complex than what's needed in the case of
>>> relational data. There we do not need to add any extra row to the table
>>> to indicate this – we just need an extra column, in the existing row, to
>>> be populated. SQL query simply needs an extra condition (e.g.,
>>> <tableAlias>.kind = 'Truth'), not an extra join.
>>> 
>>> Doing this in RDF is more complex than what's needed in the case of LPG
>>> data as well. There, assuming edge is actually stored as a struct or
>>> record, all that is needed is adding and populating an extra attribute
>>> (i.e., edge-property). The query for the extended LPG data only needs an
>>> extra filter (e.g., <edge-var>.kind = "Truth").
>>> 
>>> So, to avoid this Truth/Hypothesis handling complexity related
>>> disadvantage compared to relational data and LPG, my suggestion would be
>>> to integrate the Truth or Hypothesis indicator into the id-s-p-o tuple
>>> itself by providing two distinct RDF-blessed properties, say rdf:asserts
>>> (for Truth) and rdf:reifies (for Hypothesis) – all within the id-based
>>> scope. (This does not affect s-p-o triples in any way because those are
>>> not part of any id-based scope. Those are in a "default" (no id) scope.)
>> 
>> Truth/Hypothesis are not the only cases.
>> 
>> Consider a trivial vocabulary:
>> 
>> ex:added      -- fact added to the graph
>> ex:retracted  -- fact retracted
>> ex:source     -- URL where a triple occurred.
>> 
>> These are orthogonal aspects [1].
> 
> I definitely agree. Modeling these aspects belong to the specific
> domain (of discourse).

Yes, all these aspects belong into user space, we agree on that. But the basic distinction between annotating a statement that is true in the graph and one that isn’t has to be expressible by RDF proper.

> In RDF, triples in graphs encode simple truths (asserting "that the
> relationship holds" [1]). An RDF graph is a simple, monotonic,
> atemporal model of a world. If a triple is there, the claim about the
> world it encodes [2], is simply true (in the interpretation). No less,
> no more.

But you are making the same mistake as the CG: you misinterpret meta-statements ("statements about statements") as the thing itself. Yes, a statement can only be true in a graph or not - it can't be true multiple times or only half true. But one can have multiple meta-statements about that statement (we call them occurrences most of the time) and those meta-statements can be based on different assumptions about the truth of the statement they annotate. If those assumptions can not be expressed independently from the statement being actually present (i.e. true) in the graph, then the whole mechanism is incomplete and can only lead to ambiguous results.

My central point is that those meta-statements may or may not intend to refer to the  asserted statement (the statement that is true in the graph). Some of our use cases demand for the possibility to have meta-statements about un-asserted statements. Those use cases don’t work reliably if they are dependend on the existance of the statement they talk about in the graph.

The other point, closely related, is that most use cases intend to annotate statement that are true in the graph. The annotation syntax reflects that need. However, if those are implemented via reification, then their intended meaning is not captured by the formalisation. Yes, I know this is tricky without entailment, but I have shown a way - combining an unambiguous definition of rdfs:asserts (it expects the triple term to be true in the graph) with the annotation syntax (which captures that expectation) and a configuration of SPARQL-star (to map BGP over a union of triples and rdfs:stated triple terms).

If we don’t do this but go the traditional RDF way of saying "sorry, but the semantics don’t allow us to refer to individual statement occurrences, because of set semantics and monotonicity", then we are essentially not doing our job but leave the users in the rain with their well-founded assumpotions, a syntax that seems to fit that assumption and a smartass fineprint that reminds them to inhale the model-theoretic semantics again.

Those two issues
- annotating statements that are meant to be asserted and
- annotating statements that are not meant to be asserted
go hand in hand. I’ve outlined an easy enough way to resolve them both reliably. 

>> There could be a subproperty of rdf:reifies in a vocabulary that implies
>> a Truth or Hypothesis class about the reifier making the extra explicit
>> type triple unnecessary. Usage patterns could be a profile. A simple
>> ingestion pipeline can materialize it if desired.



Any optional vocabulary to explicitly state the intended meaning of an annotation suffers from several problems:
- it will only create confusion and frustration when the annotation syntax seems to reflect the intended meaning alright 
- users can’t be relied on to use it
- it has no formal standing

> Certainly. In fact, properties used in the description of the reifier
> can be enough to imply what it is.

And relying on implication only makes the mechanism even more byzantine. Remember, the basic distinction that IMO needs to be covered is as straightforward as possible with meta-modelling in RDF: is the annotation refering to a statement that is expected to be true in the graph, or not. That’s all we need to cover. But _that_ we need to cover.

> Example:
> 
>    :provenBy rdfs:domain :ScientificTheory ; rdfs:range :Observation .
> 
>    <X> :relation <Y> ~ <Hypothesis_A> ~ <Hypothesis_B> .
> 
>    <Hypothesis_A> :provenBy :O1 {| :source <Publication_A> |} .
> 
>    << <Hypothesis_B> :provenBy :O2 >> :source <Publication_B> .
> 
> In this "world", a relationship holds between X and Y. Two hypotheses
> are taken (to hold) as reifiers of that claim. One of them,
> Hypothesis_A, is taken as being proven by O1, which implies that
> Hypothesis_A is a scientific theory and O1 is an observation.
> Publication_A is a source of a reified claim of this proof.
> Publication_B is a source of another claim that the other,
> Hypothesis_B is  proven by O2. But given the graph, this is not known
> to be true in its interpretation. (So we don't know if Hypothesis_B is
> a scientific theory, nor that O2 is an actual observation.)
> 
>> At the moment, there is limited to no experience of using RDF 1.2 for
>> relational data


? There are standards for that more than a decade old.

>> or LPG mapping.


Indeed. But that is only one aspect of this problem. And not all of it is unclear either: I pretty much wait for Greg Williams to inquire about annotating edges that EXIST in the graph again. That issue has not been resoved so far, and would be resolved with my proposal (and with no other proposal that I’m aware of). 

>> I would hope a vocabulary and profile
>> appear and mature in a longer timescale. Making anything RDF-blessed
>> (meaning in the rdf: namespace) seems too early. It can emerge later -
>> removing things is harder.
> 
> Yes, I strongly agree.

Would you be willing to omit the annotation syntax? I bet not. But we need to define what the annotation syntax maps to in N-Triples: rdf:reifies or rdfs:asserts. So we can’t postpone the decision.

> As I see it, what is added with `rdf:reifies` and triple terms is
> definitely enough for the use cases we have (and not only the ones
> I've submitted as a representative of the National Library of Sweden).

I do indeed think that you are biased because in your domain a high degree of control is established and even required. Some other real world scenarios are much more messy and need to cope with well-structured data. Also, you still haven’t answered how you would model the following scenario (first asked in [0]):

What do you do when you have a theory and two arguments:

<< :Foo :causes :Bar >> :because :A .
<< :Foo :causes :Bar >> :because :B .

How do you express that because of one of those arguments you believe the theory is true, but in the other argument you don't believe. The current proposal can't do it:

<< :Foo :causes :Bar >> :because :A .
<< :Foo :causes :Bar >> :because :B .
:Foo :causes :Bar .

Without more help from additional statements it's not clear if you consider A valid, and therefore the theory a fact, or :B, or :A and :B, or none of them. The reification mechanism essentially becomes a glorified comment sign, and any more specific information has to be provided by extra triples. That is the reason why Enrico dances around the question what a reified statement means, using undefined phrases like "it induces" which sometimes means it entails, and sometimes it doesn't . It simply depends on other statements.

This is not "statements about statements" to extend the expressiveness of RDF as a graph formalism, but only makes sense in the context of an E/R based modelling style where meaning is encapsulated in n-ary constructs, inaccessible to simple navigational queries. That is a graph only by name. It sure is useful, but not a replacement for a graph. And maybe it doesn't need statement annotation at all, just as Enrico suggested last Friday.

> With that, RDF 1.2 would make reification exact

No, it doesn’t improve that. It’s just as disconnected from actual statements.

> and more powerful,

Yes, insofar as it now supports graphs

> working both for provenance and ("ad hoc") qualification.

Mind the apostrophes

> I've come to
> see this as warranting the cost (in "conceptual complexity") of adding
> triple terms. At this point I want to be very careful about adding any
> further capabilities or nuances.
> 
> (I have also noted [3] that ad hoc qualification for simple
> ("LPG-style") relationships versus up-front "N-ary" (or "E-R")
> modelling is not necessarily a "fatal" design decision. And I don't
> consider one or the other approach superior in all cases. It very much
> depends on domain and application requirements and expectations.)

See my discussion of "statements about statements" and the different visions of WG participants in [5]. Statement qualification IMO is essential to the viability of the graph approach. Without it binary statements are too coarse grained to be useful in practice, and RDF degenerates into an E/R formalism that has superior integration capabilities compared to closed world RDBMS systems, but breaks the promise of easy navigation and retrieval. 

Best,
Thomas


> Best regards,
> Niklas
> 
> [1]: <https://www.w3.org/TR/rdf12-concepts/#resources-and-statements>
> [2]: <https://www.w3.org/TR/rdf12-concepts/#entailment>
> [3]: <https://github.com/w3c/rdf-ucr/issues/27>


[4] https://gist.github.com/niklasl/f4a5dee1b991ff5a19a33360c6fd3078
[5] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Aug/0141.html

> 
>>     Andy
>> 
>>> 
>>> Thanks,
>>> Souri.
>>> 
>>> 
>> 
>> [1] https://www.w3.org/2009/12/rdf-ws/papers/ws12
>> 
>
Received on Friday, 23 August 2024 12:26:21 UTC