Re: summary un/asserted from Doerthe Arndt on 2024-07-09 (public-rdf-star-wg@w3.org from July 2024)

From: Doerthe Arndt <doerthe.arndt@tu-dresden.de>
Date: Tue, 9 Jul 2024 17:11:24 +0000
To: Thomas Lörtsch <tl@rat.io>
CC: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-ID: <A07F0EA5-C532-4431-A56D-ACEEE8A67369@tu-dresden.de>
Dear Thomas, 

Some initial remarks below. 


I am still trying to understand the problem or more precisely, why it would not be enough to have your two predicates rdf:reifies and rdf:instantiates (as :says vs. :saysAndAsserts from the meeting) and simply say that

:x rdf:instantiates << :s :p :o>>.

entails (in some rdf-star entailment)

:s :p :o.

I am not sure that this would be what I want, but that is how I understand your proposal?



> Am 09.07.2024 um 12:43 schrieb Thomas Lörtsch <tl@rat.io>:
> 
> Hi all,
> 
> as promised in the last WG meeting almost two weeks ago this is a summary of the issues I see with "unasserted assertions" and a proposal of how to resolve them and, in the same stroke, some other problems as well.
> 
> 
> 
> DEFINITION
> 
> "Unasserted assertions" - for lack of a better name - means statements that are described (and probably annotated), but are not contained as facts in the graph. I.e. those statements are talked about, but not endorsed. RDF standard reification is a construct to such effect: the reification quad describes a statement, but it doesn’t entail it.
> 
> 
> PROBLEM 1/4
> 
> Both CG and WG repeatedly and explicity maintained that RDF-star needs to support "unasserted assertions". However, the way that RDF-star currently implements them is ambiguous, lossy and non-monotonic even. It relies completely on the absence of a fact from the graph. If however that fact is present, it is no longer possible to talk about it as refering to something unasserted. 
> 
> For example, we may want to document and comment on a statement without endorsing it. We write 
> 
>    << :s :p :o >> :a :b .
> 
> If however that same statement is also to be part of the graph, for whatever reason, like so:
> 
>    << :s :p :o >> :a :b .
>    :s :p :o .
> 
> there is no way to express that the annotation is meant to refer to an unasserted statement. 
> There are many situations in which this problem might occur: we might want to document different viewpoints or versions, graphs might be merged or updated, adding the fact, etc.

As I said in our task force meeting, I think we can handle such things by providing the right predicate. I understand that you would like to have such a predicate as part of RDF star itself instead of it being defined  extending ontologies? That could be handled. I ask to help us to get to the point.


> 
> 
> PROBLEM 2/4
> 
> As a way out of this problem I discussed to require another statement describing if an annotation is meant to annotate an unasserted assertion, like so:
> 
>    << :s :p :o >> :a :b ;
>                   a rdf12:UnAssertedAssertion .
>    :s :p :o .
> 
> However, this is not a valid solution, because a second problem runs even deeper: the meaning of triple terms is defined as reification. Because of those reification semantics a triple term is always unasserted. By consequence we would rather need to add a statement whenever we intend to anotate a statement that we actually assert, like so:
> 
>    << :s :p :o >> :c :d ;
>                   a rdf12:AssertedAssertion .
>    :s :p :o .
> 
> In practice this would add considerable load as most annotations will aim to annotate facts in the graph, not e.g. some unendorsed viewpoints. This would also require more effort when querying.

But don’t all the other solutions also add triples. Also the one you propose below when you map to RDF 1.1.? Or do you want to have a direct semantics for {|..|} and <<..>> and the mapping was just for illustration purposes? Is that direct semantics you main point? 


> 
> 
> PROBLEM 3/4
> 
> On the plus side the approach in 2/4 could be considered to be very expressive as the actual fact would now be completely independent from any annotations, on asserted and unasserted assertions alike, adding a new degree of expressivity. However, for many use cases this separation is rather a problem than a feature, because they ask for a clear and solid connection between a fact and "its" annotation. All use cases of qualification fall in this category, e.g. Wikidata, LPG, and many more. OTOH, for use cases that aim to annotate statement with rather orthogonal aspects like provenance, refication is the right choice. So just changing the underlying semantics from one to the other is not a solution. We rather need both.
> 
> 
> PROBLEM 4/4
> 
> The shorthand syntax by intuition provides a solid link between a stated fact and its annotation. However, no other syntax does that, so an unassuming user's intuition is betrayed when the data is serialized to e.g. N-Triples. This is a serious usability problem. 
> 
> 
> 
> PROPOSAL
> 
> To properly support unasserted assertions, and to solve the semantics problems in the same stroke, let's bite the bullet and define two primitives instead of one:
> - an unasserted triple term occurrence with a semantics of reification
> - an asserted triple term occurrence with a semantics of instantiation
> Define those primitives and their semantics not in the abstract syntax, but via the two properties rdf12:reifies and rdf12:instantiates.
> 

OK, so I maybe I misunderstood. So, I have the two properties from above, but are these not enough then?


> 
> Abstract Syntax:
> 
> graph      ::= triple*
> triple     ::= subject predicate object
> subject    ::= iri | BlankNode
> predicate  ::= iri 
> object     ::= iri | BlankNode | literal | tripleTerm
> tripleTerm ::= triple
> 
> 
> Properties:
> 
> To provide the user facing triple term occurrences with their respective semantics (un-/asserted etc) we explicitly (and normatively) define two properties, with different semantics, to be used with abstract triple terms as their rdfs:range:
> - rdf12:reifies defines a reification of an abstract triple term
> - rdf12:instantiates defines an instantiation of an abstract triple term
> 
> A reification via rdf12:reifies doesn’t assert the statement described by the triple term, it merely provides an identifier to refer to an occurrence (whereever, whenever) of it. IIUC that is exactly what we have now. A mapping to RDF 1.1 clarifies that:
> 
>    :r_1 rdf12:reifies <<( :s :p :o )>> .
> 
> in RDF 1.1 would be expressed as 
> 
>    :r_1 rdf12:reifies [
>        rdf:subject :s ;
>        rdf:predicate :p ;
>        rdf:object :o .
>    ]
>    rdf12:reifies rdfs:range rdf:Statement .    # axiomatic triple
> 
> An instantiation via rdf12:instantiates OTOH does indeed assert the statement it annotates, in addition to providing an identifier to annotate that assertion. To make the connection between statement and annotation direct and solid, but not break the set semantics of RDF, the model and semantics mimick the singleton property approach. A mapping to RDF 1.1 clarifies that:
> 
>   :i_1 rdf12:instantiates <<( :s :p :o )>> .
> 
> in RDF 1.1 would be expressed as 
> 
>    :i_1 rdf12:instantiates [
>        :s :p_1 :o .
>        :p_1 rdf12:instantiatesProperty :p .
>    ]
>    :s :p :o .
>    rdf12:instantiatesProperty 
>        rdfs:subPropertyOf rdf:type .           # axiomatic triple

Here, you seem to have a typo, the object of rdf12:instantiates is a quadruple. Maybe you mean something like

 :i_1 rdf12:instantiates :p_1.
:s :p_1 :o .
 :p_1 rdf12:instantiatesProperty :p .
:s :p :o .
 rdf12:instantiatesProperty  rdfs:subPropertyOf rdf:type .           # axiomatic triple


> 
> 
> Macro:
> 
> An instantiation always entails the triple term, ':s :p :o' in the above example. This is defined as a macro when mapping between concrete syntaxes.
> 

Then 
:s :p_1 :o .
 :p_1 rdf12:instantiatesProperty :p .

needs to entail 

:s :p :o.?


But then, the solution at the beginning of the mail feels more natural to me. 





> 
> Concrete syntaxes:
> 
> I see two possible approaches. One is to not change the currently defined syntaxes, but let the shorthand syntax express instantiation, and let the standard syntax express reification. That would align syntaxes as they are defined right now with the intuitions they support, but all modifications would happen "under the hood" - e.g.:
> 
>    :s :p :o  {| :a :b |}      # rdf12:instantiates, asserted and annotated
>    << :s :p :o >> :d :e .     # rdf12:reifies, not asserted but annotated
> 
> Another approach would be to let instantiation be expressed with double chevrons and reification with triple chevrons, e.g.:
> 
>    << :s :p :o >> :a :b.      # rdf12:instantiates, asserted and annotated
>    <<< :s :p :o >>> :d :e .   # rdf12:reifies, not asserted but annotated
> 
> The first approach, based on the current syntaxes, does more clearly disambiguate the two modes of expression, but it also adds more "unrest". The second approach stays closer to the orignal RDF* proposal and is more uniform. The second approach might enable a more usable query interface, e.g.:
> 
>    ?s ?p ?o                   # asserted
>    << ?s ?p ?o >> ?a ?b       # asserted and annotated
>    <? ?s ?p ?o ?> ?a ?b       # asserted and optionally annotated
>    <<< ?s ?p ?o >>> ?a ?b     # unasserted and annotated
>    <<? ?s ?p ?o ?>> ?a ?b     # (asserted or unasserted) and annotated
>    <?? ?s ?p ?o ??> ?a ?b     # (asserted or unasserted) and optionally annotated
> 
> Occurrence identifiers are omitted in all examples, as they don’t differ from the current proposal. Likewise the mapping to N-Triples doesn’t change, except from the introduction of a new property, rdf12:instantiates.
> 
> 

I really dislike these query patterns, but that is a separated discussion. I will come back to this complaints once I totally get your idea  (one discussion at the time :) )


I tried to keep my comments really short in the hope to not get sidetracked. So, did I get your point?


Kind regards,
Dörthe


> 
> DISCUSSION
> 
> 
> In principle:
> RDF is an Open World technology, designed to facilitate decentralized authoring and integration of data and we can’t rely on the absence of statements to convey meaning (e.g. unassertedness or non-endorsement). Out-of-band arrangements in concrete application may be more specific, but we can not make any claims based on that (and "RDF-star supports unasserted assertions" would be such a claim).
> I’d like us to go the extra mile and adopt the above proposal: implement two different kinds of annotation primitives. This also opens the road to a future with more elaborate constructs like quoted versioning. 
> I could also live with a scaled down reference to the support of unasserted assertions, on the lines of "you can emulate a surrogate support for unasserted assertions like this, but be aware that the construct easily breaks in practice if not tightly controlled".
> However, that would still not solve the other problem, namely that reification is not the right formalisation for most of our use cases.
> 
> Properties:
> defining the semantics of triple term occurrences via the properties rdf12:reifies and rdf12:instantiates is a modification of the mechanism introduced by the TEP proposal in the RDF-star CG report.
> 
> Instantiation:
> There are different names for the underlying concept. It can also be understood as a form of n-ary relation where the instance-type relationship is modeled via a blank node. So the term "instantiation" refers to its most theoretical aspect (which best mirrors "reification", and that’s why I chose it in this summary), "n-ary relation" would refer to the way of modelling it, and yet another term "qualification" would emphasize the meaning of the construct. "Singleton properties" is a term that I try to avoid because the proposal has been met with so much resistance, but it is a concrete implementation of the same concept. I provide a very singleton property like mapping to RDF 1.1 above, but slightly different mappings could provide better computational properties, e.g. letting the object refer to the singleton, resulting in better indexing and join performance.
> However, what is most important to me is that the qualifying annotation, by being attached to the instantiation, is unmistakeably annotating a statement that is actually asserted in the graph. The only metaphysical baggage involved is the definition of what a type-instance relation is, and that should be uncontroversial.
> Defining the relation between annotated thing and annotating thing not as a subproperty-relation but as a type-instance-relation follows the idea that those annotated relations are not meant to be annotated any further (that too is possible, but not the norm). So they represent leaves of an inheritance tree rather than knots. In OO tradition leaves are understood as instances, whereas knots would be defined as sub-properties.
> 
> Mappings:
> The mappings to RDF 1.1 are provided for two reasons:
> - clarify the meaning of new constructs in terms of well-known ones
> - provide a path to backwards compatability.
> RDF-star implementations should not be required to support them on the implementation level. 
> 
> Many-to-many:
> The astute reader will have noticed that both mappings to RDF 1.1 are many-to-many. However, owing to the semantics inherent in instantiation, this is a kind of many-to-many relationship that only properly supports co-denoting statements. A common instantiator for :Car and :MotorizedVehicle makes sense, whereas for conceptually very different entities like :Car and :Person it mostly does not. Such a semantics remains true to the integration focus of RDF, but also caters to the more focused approach of LPG. Reification on the other hand rather refers to the statement(s) as a whole, as an entity in their own right. This caters well to use cases that explicitly don’t want to qualify statements but that that try to keep a safe distance between statement and annotation, e.g. provenance and other orthogonal concerns.
> 
> Querying: 
> Do we currently allow to query for the abstract triple term <<( … )>> or do we plan/have to do so? If yes, that might be a better replacement for the last option above: <?? … ?>>. In any case the above proposal is just a sketch and may not even be helpful. I’m not good at querying.
> 
> Abstract triple:
> People can use the abstract triple term as object of whatever statements they like. They are on their own with such freewheeling usage, as no other semantics are defined in the spec than those of rdf12:reifies and rdf12:instantiates. However, this may lead to some fruitful experiments, e.g. with referentially opaque triple terms, and it stays in the spririt of RDF being a pretty open technology. 
> 
> Fragment identifiers:
> Reification defines a handle to address the whole triple as an object, instantiation rather defines a handle to address the predicate of the triple. Therefore an annotation on a reifier annotates the whole triple as an object, an annotation on an instantiator rather qualifies the relation itself. Both arrangements are not set into stone, and adding more specific properties to explicitly annotate and qualify the subject, predicate, object or whole triple (or a set thereof) may be defined. The RDF reification vocabulary might be reused to that end, but defining a new set of properties seems to be the safer approach. Such properties may be applied to reifications as well as instantiations (and there are use cases for both).
> 
> Merging and temporal aspects:
> It was argued that the problems outlined above are not actual problems for us but related to issues outside of RDF-star, i.e. merging is not an issue for RDF-star and temporal aspects are not considered in RDF at all. I disagree with both accounts: RDF is a technology focused on decentarlized data integration. Such integration requires merges, and it leads to the addition of statements in existing graphs. In both cases the situation may arise that a statement that was annotated but not endorsed is added as an actual fact. Then what was meant to be unendorsed suddenly is endorsed. Problems with other use cases, like representing different viewpoints, notwithstanding.
> 
> 
> 
> Best,
> Thomas
Received on Tuesday, 9 July 2024 17:11:39 UTC