Proposal: described vs stated triple terms

Hi all,

as promised a proposal for how to better capture the concept of "annotating statements that are not asserted in the graph". Because I try to explain everything in detail it got too long again. So I’m prepending a tl;dr to present the proposal without the explanation.


0 tl;dr


Drop:
- 'rdf12:reifies'
- the well-formed abstract syntax
- the annotation syntax (optional)

Define two properties with precise model theoretic semantics:
- 'rdf12:describes' to describe and annotate a statement 
   (replacing 'rdf:reifies')
- 'rdf12:states' to state and annotate a triple

Define respective syntactic sugar
- <" :s :p :o "> for described terms 
  (replacing the current << … >>)
- << :s :p :o >> for stated terms 
  (re-defining the current << … >> to its RDF* origin)

Define a mapping to RDF 1.1
- unambiguously capturing the semantics of those properties
- providing backwards compatability

Result:
- unambiguous expressivity w.r.t. annotations on asserted statements 
- unambiguous expressivity w.r.t. annotations on described statements
- much safer roundtripping between Turtle-star and N-Triples-star

Cost:
- one more property



1 NOMENCLATURA

Nomeclatura first, so we have a common vocabulary:
- graph
  a set of triples
- triple
  a three-tuple of subject, predicate and object
  stated/asserted in a graph
- statement
  a three-tuple of subject, predicate and object
  not necessarily stated in a graph
- triple term: 
  the tripleTerm as defined in the abstract syntax of the current proposal [2], 
  describing a triple as abstract, un-asserted type
  serialized as <<( :s :p :o )>> in Turtle-star
- described term 
  a reified triple term together with an optional identifier,
  serialized as << :s :p :o >> or << :r | :s :p :o >> 
  in the current proposal of Turtle-star
- stated term 
  an asserted reified triple term together with an optional identifier,
  serialized as << :s :p :o >> in Turtle* (the pre-CG version)
  and as ':s :p :o {| … |}' in the annotation syntax of Turtle-star



2 PROBLEM

It was requested from the start of the CG that RDF-star provides a way to annotate triples without adding them as facts to the graph. Use cases are versioning, un-confirmed data, propositional attitudes, hypothetical statements, etc. In the past the RDF reification vocabulary has often been (mis-) used to this effect, but its verbosity makes this approach rather unpopular.
The current design of RDF-star supports that request to some degree, but in a way that is neither economical nor very expressive. To add and annotate a statement, two triples have to be added to a graph, e.g.

    :s :p :o .
    << :s :p :o >> :a :b .

The annotation syntax adds some syntactic sugar to save users from typing, e.g.

    :s :p :o {| :a :b |} .

but the triple count in N-Triples remains the same.

The problem in abstract terms is that the same syntactic device, the reified term << :s :p :o >>, is overloaded to represent either unasserted or asserted statements, depending on the actual presence of the triple so described in the graph. This is an inherently non-monotonic design and it leads to several problems.


Problem 2.1: Verbosity

On the surface of it annotating a triple term without stating it is very easy: just omit the stated term. As long as the triple from the described term isn’t added to the graph, the intention of the annotation is clear. Simple use cases work as intended. 
The cost however is already high: the standard case of stating and annotating a triple requires two triples, whereas the much rarer case of annotating a statement without adding it as a fact gets away with only one triple. It would be much more economical to design this the other way round.


Problem 2.2: Ambiguity

But what if for some other reason that triple _is_ added? Maybe different people write to the same graph and want to add their opinion on statements that some of them endorse (and add as triples to the graph) but others don’t. Maybe a statement is first seen very critical, and as such described and annotated - however, opinions may change and the triple will be added later, but now the initial annotation may seem very strange. RDF is about integrating decentralized data in an open world scenario, and in general control of which triples are added to a graph should (or may? or even must?) not be assumed.

The more logically oriented argue that the semantics of RDF is very clear: triples are facts, and completely independent from described/reified terms and their annotations. The consequence of that position however is that every annotation on a described term has to add another triple stating that the annotation is intended to refer to a statement that is actually asserted, i.e. present as a triple in the graph. So the standard case would require 3 triples even, and look something like this:

    :s :p :o .
    << :s :p :o >> :a :b ;
                   rdfx:semantics rdfx:Asserted .

Alternatively either every annotation on a described term or even _every_ annotation would need to add another triple to express what it intends to annotate: a described term or a stated term.


Problem 2.3: Lack of Expressivity 
             Lack of Robustness to Updates

Imagine a theory that Foo is made of Bar and three different arguments in its favor:

    << _:one | :Foo :madeOf :Bar >> :because :Arg_1 .
    << _:two | :Foo :madeOf :Bar >> :because :Arg_2 .
    << _:tre | :Foo :madeOf :Bar >> :because :Arg_3 .
    :Foo :madeOf :Bar .

How would one express, under the current proposal, that one endorses theory 2 but neither 1 nor 3? The current proposal would need an additional triple, annotating _:two to that effect.

And how would one, after stating 

    << _:two | :Foo :madeOf :Bar >> :because :Arg_2 .
    :Foo :madeOf :Bar .

add the other arguments

    << _:one | :Foo :madeOf :Bar >> :because :Arg_1 .
    << _:tre | :Foo :madeOf :Bar >> :because :Arg_3 .

for completeness, but without endorsing them? One would need to check the graph for prior arguments and either annotate those, or the new ones, or both.


Problem 2.4: Disappointed Intuitions

Some argue that the annotation syntax disambiguates such cases, e.g. 

    << :s :p :o >> :reportedBy :Alice .
    :s :p :o {| :reportedBy :Bob |} .

captures pretty well the intuition that Alice _only_ reports about ':s :p :o' whereas Bob actually considers it a fact. However, that very intuitive take on propositional attitude gets lost when converting to standard triples:

    :s :p :o .
    << :s :p :o >> :reportedBy :Alice , 
                               :Bob .

and roundtripping is not deterministic, as e.g. one possible result would be 

    :s :p :o {| :reportedBy :Alice |} .
    :s :p :o {| :reportedBy :Bob |} .

The very plausible intuition that the annotation syntax alludes to on the surface is dropped in the raw data. Such discrepancies between intuition and formal meaning are always very problematic.


Problem 2.5: Fundamentally Different Interpretations of Reification

There exist different opinions if the described term is asserted or not, as recently discussed [0]. According to the 'RDF-star and LPGs' wikipage [1] even reified terms can be considered to be asserted in the graph. In that interpretation the ability to annotate statements without asserting them is all gone. There seems to be no solid common ground about what it actually means to reify a statement.



3 SOLUTION


3.1 Add one property, rename another one

A much more comprehensive and expressive solution can be achieved with little effort. RDF-star already defines two different syntaxes that cater to two different intuitions, i.e. the reified term syntax 

    << :s :p :o >> :a :b .

and the annotation syntax for reified+stated terms

    :s :p :o {| :a :b |} .

The most important thing to do is to add a second property that unlike 'rdf12:reifies' actually states what it describes. That property is named

    rdf12:states

and, for clarity, 'rf12:reifies' is renamed to

    rdf12:describes


3.2 Abstract syntax

The abstract syntax from the recently agreed upon baseline proposal [2] remains mostly unchanged, except for constraining tripleTerms to the object position (and some re-wording for readability): 

    graph       ::= triple*
    triple      ::= subject predicate object
    subject     ::= iri | blankNode
    predicate   ::= iri 
    object      ::= iri | blankNode | literal | tripleTerm
    tripleTerm  ::= triple

IMO the additional abstract syntax of well formed RDF can be dropped and replaced by well-defined semantics of the two new properties. 


3.3 Properties

The two properties introduced above create references to tripleTerms. The following definition is a bit involved because it tries to capture all important aspects of what has been discussed so far. However, its essence, i.e. the only thing new, is that 'rdf12:states' actually entails the triple described by the referenced triple term. I tried to be quite exact, but as a result readabilty suffered and the wording is probably still not bullet proof ;-)

    rdf12:describes a rdf:Property ;
        rdfs:comment "
            Defines a reference to a description of an occurrence 
            (i.e. an instance) of one or more statements for the purpose of
            further annotation.
            Like with RDF reification the terms used in those statements refer
            to entities in the domain of discourse (i.e. this is not a form of
            quotation).
            Also like with RDF reification the statements so described are not
            entailed by the description. 
            Stronger even, describing them doesn’t make any assumption about
            their existence in the graph containing the description - rather
            to the contrary annotations on described terms should not be
            assumed to annotate any such triples possibly contained in the
            graph (or any other graph).
        " ;
        rdfs:domain [ a rdfs:Resource ;
                      rdfs:label "described triple term" ] ;
        rdfs:range rdf12:TripleTerm .

    rdf12:states a rdf:Property ;
        rdfs:comment "         
            Defines a reference to an occurrence of stating (i.e.
            instantiating) one or more statements for the purpose of further
            annotation.
            Like with RDF reification the terms used in those statements refer
            to entities in the domain of discourse (i.e. this is not a form of 
            quotation).
            However, unlike with RDF reification and unlike 'rdf12:describes'
            above, they entail the triples described by the linked triple
            terms.
            Annotations can be understood as qualifying an instance of those
            triples.
        " ;
        rdfs:domain [ a rdfs:Resource ;
                      rdfs:label "stated triple term" ] ;
        rdfs:range rdf12:TripleTerm .


The model theoretic semantics of the meaning of 'rdf12:describes' _seems_  equal to that of 'rdfs:reifies' as defined by the "working baseline" [2] (i.e. statements are described by triple terms in object position but not stated - but maybe that needs clarification, see Problem 2.5 above).
The model theoretic semantics of the meaning of 'rdf12:states' differs from that of 'rdf12:describes' insofar as it entails the triples described by the triple terms in object position. Enrico in a private mail provided me with a sketch of a formalization (as an extension to the working baseline [2]):
    [I+A](t) = TRUE if and only if <[I+A](t.s), [I+A](t.o)> ∈ IEXT([I+A](t.p)) 
    and [I+A](t.o) = TRUE if t.p = rdf12:states . 

Future generations may experiment with other properties, e.g. for the purpose of quotation, or to annotate triple terms as types, and IMO there’s no need to restrict such advances. RDF in general takes a rather liberal stance and we should continue that. This proposal does of course not make any assumptions about the meaning of such unstandardized practices.


3.4 Entailment macro

What does it mean that 'rdf12:states' entails the triple that the triple term defines? After all RDF simple entailment doesn’t support such entailments. The baseline proposal introduced the notion of a "macro" that converts between the annotation syntax and standard Turtle-star, such that annotation syntax

    :s :p :o {| :a :b |} .

is converted to standard Turtle-star

    :s :p :o .
    << :s :p :o >> :a :b .

that then is converted to N-Triples-star

    :s :p :o .
    _:s1 rdf12:reifies <<( :s :p :o )>> ;
        :a :b .

The same "macro" trick can be used to capture the desired semantics of 'rdf12:states'. In practice reasoning engines will know what to do. The real issue is rather to make sure that a query for a triple in SPARQL returns triples and stated terms alike if no other instructins are given - see below for a bit more detail.


3.5 SYNTAX

The definition of triple terms remains completely untouched, and the same goes for its syntax: a tripleTerm is still encoded as <<( :s :p :o )>>.
However, for the purpose of a clean design the meaning of the << … >> syntax should be changed back to its definition in the original RDF* proposal: encoding a stated term. To encode described terms a variation of '<< … >>' is introduced, namely '<" … ">' (or, if that turns out to be too weak, '<<" … ">>' ), where the apostrophes allude to the fact that the enclosed triple is not actually asserted.

New annotated Turtle-star

    <" :s :p :o "> :a :b .    # described term
    :s :p :o {| :c :d |} .    # stated term

New standard Turtle-star

    <" :s :p :o "> :a :b .    # described term
    << :s :p :o >> :c :d .    # stated term  

New N-Triples-star

    :d1 rdf12:describes <<( :s :p :o )>> ;
        :a :b .

    :s1 rdf12:states <<( :s :p :o )>> ;
         :c :d .
    :s :p :o .

It’s to be discussed if the annotated syntax is really needed anymore. I’m not a big fan of it anyway as A) it reverses the direction of syntactic sugaring from annotated to annotating triple and B) it’s not easy to extend to graph terms.


3.5.1 Roundtripping

Roundtripping between syntaxes is still not completely lossless, as it may happen that the statement ':s :p :o' is present in a graph as both a simple triple and a stated term, e.g.:

    << :s :p :o >> :a :b .
    :s :p :o .

Converting this to N-Triples-star

    :s1 rdf12:states <<( :s :p :o )>> ;
         :a :b .
    :s :p :o .

and back again

    << :s :p :o >> :a :b .

may lose the standard triple. AFAIKT this degree of lossyness is unavoidable, but it’s much better than the current proposal because it can’t lose the information if an annotation is meant to refer to a stated or merely a described term, e.g.:

    
    << :s :p :o >> :a :b .    
    :s :p :o .
    <" :s :p :o "> :c :d .

Converting this to N-Triples-star

    :s1 rdf12:states <<( :s :p :o )>> ;
         :a :b .
    :s :p :o .
    :d1 rdf12:describes <<( :s :p :o )>> ;
         :c :d .

and back again

    << :s :p :o >> :a :b .
    <" :s :p :o "> :c :d .

the bare ':s :p :o' triple  may get lost in the process (as it might be a side effect of the 'rdf:states' construct) but it remains clear that ':c :d' is not meant to annotate an actual triple.


3.5 Mapping to RDF 1.1

This proposal hopes to faithfully and robustly capture the intent of the - only vaguely defined - use case of "unasserted assertions", i.e annotating statements without adding them to the graph. However, it achieves much more than that: it also captures unambiguously the intent of annotating triples, i.e statements that _are_ asserted in the graph. 
Recent discussions have shown that there is a lot room for interpretation if what an occurence term describes is actually available as factual knowledge or not (see Problem 2.5 above). Also the concept of reification which we relied on for a long time in our discussions has different interpretations in the literature, and many of them are much loser than the definition of RDF’s reification vocabulary. Some of us refer to the interpretation in the RDF spec, others argue with a wider interpretation in mind. All that creates room for diverging interpretations in practice. 
The following mapping to RDF 1.1 reduces such interpretative uncertainties, and it also allows to disambiguate annotations on a statement as a whole (i.e. as an entity of its own right) from annotations on its parts (i.e. subject, predicate and object). 
I have previously argued that this amounts to the difference between reification and instantiation, but the mapping to RDF 1.1 shows that those are not very useful categories. It is also questionable to differentiate them as "external annotation" vs "internal qualification". It isn’t even always necessary to disambiguate between an annotation on the triple as an entity vs on the relation described in that triple. That’s why the mapping in its simple form doesn’t make such distinctions. However, the more complex examples show how, in 80/20 fashion, it _can_ precisely annotate each individual node if necessary.


3.5.1 Simple mapping rdf12:states

In a simple mapping of rdf12:states

    _:x rdf12:states <<( :s :p :o )>> ; 
        :a :b .
    
or, equivalently,

    << _:x | :s :p :o >> :a :b .

maps to  RDF 1.1 

    _:x rdf12:states [
            rdf12:termSubject :s ;
            rdf12:termPredicate :p ;
            rdf12:termObject :o 
        ] ;
        :a :b .
    :s :p :o .                        # entails the triple ':s :p :o .'


3.5.2 Simple mapping rdf12:describes

In a simple mapping of rdf12:describe 

    _:y rdf12:describes <<( :s :p :o )>> ;
        :c :d .

or, equivalently,

    <" _:y | :s :p :o "> :c :d .

maps to  RDF 1.1 

    _:y rdf12:describes [
            rdf12:termSubject :s ;
            rdf12:termPredicate :p ;
            rdf12:termObject :o 
        ] ;
        :c :d .


3.5.3 Complex mapping rdf12:states

In a complex mapping of rdf12:states

    _:x rdf12:states <<( :s :p :o )>> ;
        :a :b ;
        rdf12:termObject [ :e :f ] .  # annotating the object term

or, equivalently,

    << :s :p :o >> :a :b ;
        rdf12:termObject [ :e :f ] .

maps to RDF 1.1

    _:y rdf12:states [
        rdf12:termSubject :s ;
        rdf12:termPredicate :p ;
        rdf12:termObject [
            rdf12:qualifies :o ;      # qualifying the object
            :e :f .
        ] ;
        :a :b .
    :s :p :o .

'rdf12:qualifies' is a property that defines the primary value in an n-ary relation. It’s a subproperty of rdf:value, distinguished by its very specific purpose.


3.5.4 Concrete example

A concrete example, not as N-Triples-star but as standard Turtle-star syntax:

    << _:x | :Alice :buys :Car >> 
        :src :Bob ;
        rdf12:termSubject [ :age 18 ] ;
        rdf12:termPredicate [ :payment :Cash ] .

maps to RDF 1.1

    _:x rdf12:states [
        rdf12:termSubject [
            rdf12:qualifies :Alice ;
            :age 18 ] ;
        rdf12:termPredicate [
            rdf12:qualifies :buys ;
            :payment :Cash ] ;
        rdf12:termObject :Car ;
        rdf12:triple [ :src :Bob ] 
        ] .
    :Alice :buys :Car .

Here the newly introduced 'rdf12:triple' property allows to unambiguously refer to the triple as an entity of its own right, disambiguating it from references to the relation described by 'rdf12:termPredicate' and paving the way to multi-triple annotations if 'rdf12:states' is a one-to-many relation.

Note that I didn't re-use the RDF reification vocabulary but minted new properties to refer to individual nodes in stated and described terms, because I didn’t want to have to deal with the semantic baggage of RDF reification. But I wouldn’t rule that out either, depending on discussion.


3.6 Querying

Not my main area of expertise… I expect the following arrangements to be useful.

Given the graph

    :a :b :c .                        # triple
    << :d :e :f >> :u :w .            # stated term
    <" :g :h :i "> :x :y .            # described term

the following queries should return the desired subsets.


3.6.1 Triples, annotated or not

    SELECT * WHERE ?s ?p ?o

would return triples and stated terms

    :a :b :c
    :d :e :f

but NOT the merely described statement ':g :h :i'.
This means that SPARQL has to make sure that what is entailed via the 'rdf12:states' property is actually returned to the user by a query, implementing the "macro" in the background.


3.6.2 All triples, no matter if annotated, only described, etc

    SELECT * WHERE  <? ?s ?p ?o ?>

would return all triples

    :a :b :c
    :d :e :f
    :g :h :i


3.6.3 Only annotated triples

    SELECT * WHERE << ?s ?p ?o >>

would return only stated terms 

    :d :e :f


3.6.4 Only described terms

    SELECT * WHERE <" ?s ?p ?o ">

would return only described terms

    :g :h :i



4. Next

In the back of my head I also have another syntax to make annotating individual nodes much easier, plus to also annotate groups of stated/described terms. I’ll keep that for another mail - better to finish this topic first.


Best,
thomas


[0] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jul/0091.html
[1] https://github.com/w3c/rdf-star-wg/wiki/RDF-star-and-LPGs
[2] https://github.com/w3c/rdf-star-wg/wiki/RDF-star-%22working-baseline%22

Received on Tuesday, 23 July 2024 12:24:56 UTC