statements about statements

Dear WG,


the Semantics Task Force meeting last Friday to me revealed something pretty worrying: many members of this WG seem to not see a future in RDF as a graph formalism but rather as an E/R technology, i.e. an RDBMS with shared vocabularies and globally unique identifiers, RDF graphs as glorified relational tables, with abstract row identifiers as the main subjects of discourse. 

That certainly has some value, but what about the idea of integrating data on the fly and easy exploration? Only graphs are malleable enough to seamlessly adapt to the most diverse sources and unforseen usage requirements. Should we leave graphs to LPG, and let Cypher do the graph navigation? And what about our charter, "statements about statements" - does that suggest to abandon graphs? 

Some make no secret of their dissaproval of the task of this WG, and of their preference for E/R style modelling. Some tell us that they direct their customers  towards E/R style modelling. Some say that reification (which the RDF-star semantics is designed after) should not be used, but instead E/R modelling is to be preferred. Some are concerned only with provenance recording, a largely orthogonal dimension.


My vision for RDF-star (or statement annotation in general, as I never was much of a fan of RDF-star because of its focus on single triples) is qualification. Qualification increases the expressivity of the simple RDF statement. The idea is straightforward, and LPGs have proven its attractiveness to users: encode the central fact as a simple binary statement, and add detail as annotations. That is of course challenging in RDF, because RDFD is based on sets and is monotonic and statements represent meaning, not a speech act.

Clearly, something similar can be achieved with n-ary relations in RDF where subject and object nodes are blank nodes with attributes, or maybe even use an ontology design pattern - but the main statement, the central aspect will get lost in detail, the property is especially hard to annotate, and the result is pretty impossible to navigate in a follow-your-nose fashion. Or go E/R right away with an abstract identifier and all the attributes deemed appropriate (maybe defined as a shape), and remain linkable, but the resulting construct is then not a graph anymore by any meaningful interpretation of the word. IMO statement annotation as qualification, not only as administrative add-on, is a very elegant way to make RDF more expressive, without losing its core feature of modelling simplicity and follow-your-nose style navigation.

My standard example is this:

    Alice at age 18 bought a used car, paying cash, as Dave tolds us.

Reformulated slightly, to more clearly discriminate between main information and secondary detail (and dropping all implicit notions of temporality, as RDF is atemporal):

    Alice, at age 18,
    buys, paying cash,
    a car, which is used,
    as Dave tells us.

This is easy to express via a qualified statement, because clearly there is a primary fact:

   :Alice :buys :Car .

The rest is secondary detail. If we design RDF-star as a means to reliably annotate such a triple with qualifying detail, we can easily match the expressivity of LPG (and of course provide much more than LPGs can in many other ways). This is a graph vision! Also, it takes nothing away from the E/R-focused approach which certainly has its virtues too. The two can interact seamlessly, providing both easy navigation and detail. Some may know Ben Shneiderman’s famous "visualization mantra": 

- overview first, 
- zoom in, 
- detail on demand. 

This is valid not only for visualization but for any exploratory mechanism, and RDF by design and vision wants to be such an exploratory tool:

- the primary aspect of an annotated relation provides the overview: :Alice :buys :Car.
- annotations zoom in on aspects like her age and the state of the car
- detail on demand is provided by lookig up :Alice, :Car, :paymentMethods, etc, encoded probably in E/R style or as shapes or ontology design patterns.

This is my vision for the evolution of RDF, and for catching up to LPG's ease of expressivity.

The problem is just that a way has to be found to match the set based semantics of RDF with triple annotation (which by intuition has a bag semantics), and then harmonize that with SPARQLs bag semantics.

What we need is soemthing like this:

   :Alice :buys :Car ~ :q {|
       rdf:onSubject [ :age 18 ] ;
       rdf:onProperty [ :payment :Cash ] ;
       rdf:onObject :Used ;
       :source :Dave .
     |} .

This annotates - in an administrative way - the statement as a whole: provenance, credibility, etc refer to the triple as an entity in its own right. 
It also annotates - in a qualifying way - individual parts: obiously subject and object, but also the type of relation (which is something else than the whole statement). The typing annotation on the object doesn’t even require an intermediate blank node. Catchier property names would be nice, but I have no idea on that yet. Note how all these annotations only add detail, but do not alter or diminish the meaning of the main fact. I.e. this is completely monotonic.


All this works with the current baseline as long as "nothing happens". What could possibly go wrong?

1) Reification just mentions "such a statement" but by design doesn’t make any claim about the actual existence, the truth, of that statement in the graph. That is most of the time NOT what we want. Clearly this can be dealt with by convention, i.e. just ignore the fineprint in the semantics, but that A) is unreliable and B) robs us of the ability to refer to statements that we don’t want to be true in the graph.

2) The workaround that the baseline provides is based on circumstance: if the reified statement is also present as an asserted statement in the graph then the annotation refers to a statement that is true, otherwise it doesn’t. This arrangement is brittle and even non-monotonic, as what was an annotation on an unasserted statement just now can become a qualification on a statement the next moment. 
Alos it isn’t able to keep apart annotations on both stated and unstated statements. E.g. a theory and two arguments:

    << :Foo :causes :Bar >> :because :A .
    << :Foo :causes :Bar >> :because :B .

How does one express that because of one of those arguments the theory is believed to be true in the graph, but not because of the other argument? The current proposal can't express that:

    << :Foo :causes :Bar >> :because :A .
    << :Foo :causes :Bar >> :because :B .
    :Foo :causes :Bar .

Without more help from additional statements it's not clear if A is considered a valid argument, and therefore the theory a fact, or :B, or :A and :B, or none of them. 

3) There’s not an infinite amount of use cases, which would warrant to delegate the issue into user space, i.e. domain ontologies. There are the two very basic cases: asserted or not. There’s one more to capture all use cases submitted to the WG: quoted. There would be a forth one - stated as quote, i.e. a referentially opaque statement - if one really wants to go there, but that definitely is it. This not only is pretty manageable, it also IMO should be managed by the RDF specification and not as some afterthought, requiring extra statements or user-defined vocabulary. 

4) The current proposal tackles the problem from the wrong side: it makes the outlier, unasserted statements, the norm via reification, and requires annotations on statements that are expected to be true in the graph to state that expectation explicitly. Even the shorthand annotation syntax would be required to add such a statement, otherwise what the syntactic sugar suggests is lost in the backend. Some propose the following work around:

   :Alice :buys :Car ~ :q {|
       a rdf:Stated 
     |} .

This requires users to state the obvious. No explanation is provided why in this opinion the annotation syntax shouldn't be mapped to an 'rdfs:states' property, even if such a property was to be introduced. The whole concept of 'rdf:states' makes no sense without support in Turtle as well as SPARQL. And introducing 'rdfs:states' but NOT mapping the annotations syntax to it makes even less sense.
This is an issue that won’t go away, because users can introduce new properties, but not new syntax.

5) So we clearly need an 'rdfs:states' property, and alongside an 'rdf:mentions' (or 'reifies') to support the annotation of statements that are not expected to be true in the graph. Those two should be supported by syntactic suagr: "stated" with the shorthand {| annotation syntax |}, "mentioned" with the << RDF-star term syntax >>. 
We may also define "quoted", but IMO we don’t need to support it with extra syntax. Anything else belongs into user space as domain ontologies.

6) The semantics of 'rdfs:states' must be clearly defined as entailing the statement it refers to. The definition must make clear that the statement referred to is expected and assumed to be true in the graph and that conforming applications should treat it as such, even if entailment is not available. Mapping the annotation syntax to 'rdfs:states' is one way to ensure correct usage. This should go a very long way towards ensuring correct authoring in practice. N-triples based machinery will have to find other ways, reasoning engines will have no trouble implementing the required procedure.

7) That leaves SPARQL. If authoring is covered by the annotation syntax, then an equally seamless and invisible mechanism is needed on the querying side. The easiest way would be to follow the example of the original RDF* proposal and define that stated triple terms are part of the set of standard triples. Anything more involved, like keywords to that effect ("WITH STATED") or syntactic sugar for OPTIONAL statements is probably too involved and error prone. Implementations, even high profile ones like Blazegraph, have shown that this is possible and have relevantly contributed to the thrust behind RDF* that led to this WG. So why not take that approach up.


Best,
Thomas

Received on Wednesday, 21 August 2024 13:12:51 UTC