Use cases for [ Re: An update on [Proposal: described vs stated triple terms] ] from Thomas Lörtsch on 2024-08-07 (public-rdf-star-wg@w3.org from August 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Wed, 7 Aug 2024 19:28:43 +0200
To: Andy Seaborne <andy@apache.org>
Cc: public-rdf-star-wg@w3.org
Message-Id: <150BD47A-66F8-4BC0-B59B-21D67E894E96@rat.io>
> On 6. Aug 2024, at 16:16, Andy Seaborne <andy@apache.org> wrote:
> 
> On 05/08/2024 16:37, Thomas Lörtsch wrote:

>> SPARQL
>> ======
>> In Friday’s meeting we discussed if SPARQL should support stated triple terms by querying for them too, even when the BGP only mentions reified triple terms. To that end 'rdfs:states' should be defined as a subproperty of rdf:reifies.
>> However, upon further reflection it seems to me that the real benefit
> 
> The WG has been using use cases. It would be helpful to have a use case to justify a feature - is it common enough to motivate special syntax given it can already be done in RDFS-star.

[…]
>    Andy

Two ways to answer this: going over the Use Cases Repository [0] sure is one of them, and done below. But let me first try to paint a big picture, because use cases - especially if they are just a collection of submits, not an authored survey - tend to gloss over or not even mention the obvious. 

Obvious to me is that most data on the semantic web is considered facts, i.e. true in the graph, not descriptions of statements that are not considered true and therefore are not to be added to the graph. I’ve read hundreds of papers about metamodelling in RDF (and presented them in [1]) - Notation3, reification, named graphs, fluents, contextualized RDF, RDF+, singleton properties, nanopublications, RDF*, you name it - and I’m not sure if any of them discussed the need to describe statements without actually stating them. I also don’t know of tutorials or blog posts about the latter. I know use cases and I sure do subscribe to some of those use cases myself, and some of them are in the UCR, so it seems that the picture is different when the viewing angle is reduced to just annotations of triples. 

Still, I maintain - and that is in no way meant to be derogatory or belitteling - that annotating statements without actually asserting them is not the norm and not even a huge part (like a half or a third), but a minority on the web of data. From that follows NOT, that we should not support it. But like with any design of noteworthy complexity we have to set priorities and accordingly decide which need is to be fulfilled with the least triples, is to be provided with the most straightforward syntax, is to be supported with the easiest querying mechanism, etc.

In that respect I think that the current design of RDF-star still has serious problems:
- it is practically impossible to annotate a statement and be sure that it is true in the graph
- it is equally impossible to annotate a statement without asserting it, and be sure that that propositional attitude is solidly and monotonically reflected in the graph
- the annotation syntax gives the right intuition, but the n-triples based backend betrays that intuition in any non-straightforward use case.

Compare this to the original RDF* proposal [2]:
- there it was not possible to annotate a statement that is not true in the graph
- but it was also not possible to annotate a statement and not assert it to be true in the graph. Statement and annotation were irrevocably linked.
A totally different configuration! I’m not saying that it was perfect - it sure wasn’t, for reasons that have been debated at length - but this aspect is got right.

There are other problems in this context:
- annotating and stating a statement is more verbose than just annotating it (it needs 2 statements instead of 1)
- querying is verbose in RDF* and AFAIKT it didn’t get better with SPARQL-star annotation syntax
Those we should keep in mind when looking for a solution to the main problem.



Now the tedious part...

CATEGORIZATION OF USE CASES [0]:

STATED
======

- Capturing triple origin in SPARQL-star
  https://github.com/w3c/rdf-ucr/wiki/Capturing-triple-origin-in-SPARQL-star
Here RDF-star is projected to capture provenance information when federating data. The use case doesn't mention that the federated data is not considered to be true (i.e. that the sources are not trusted), but such a case is certainly thinkable.

- RDF-star for explanation and provenance in biological data (UniProt)
  https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-explanation-and-provenance-in- Explanations and attributions are on asserted triples, in the vast majority of cases.

- RDF star for labelled property graphs
  https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-labelled-property-graphs
From the LPG world no use cases for statements that are not true in the graph are known.

- RDF‐star for Annotations as Miscellaneous Marginalia
  https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Annotations-as-Miscellaneous-Marginalia
"By putting the extraneous data on the triples they are about, it is clear that the triples itself are to be used as is, but someone (or a process) administering these triples can use the added annotation as guides for performing further assessment of its trustworthiness, specificity or related qualities."

- RDF‐star for Artsdata.ca
  https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Artsdata.ca
"RDF-star is used to record metadata on select triples used for minting new URIs, and deemed authoritative for answering questions posed to the knowledge graph."


DESCRIBED
=========

- Describing a Union of Changes to a Named Graph
  https://github.com/w3c/rdf-ucr/wiki/Describing-a-Union-of-Changes-to-a-Named-Graph
Versioning related use cases tend to need to refer to statements that are outdated, not yet checked (in) or otherwise not considered to be true in the graph.

- RDF star for recording commit deltas to an RDF graph
  https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-recording-commit-deltas-to-an-RDF-graph
Dito.


BOTH
====

- RDF Star for Talking About Multiple Triples at Once
  https://github.com/w3c/rdf-ucr/wiki/RDF-Star-for-Talking-About-Multiple-Triples-at-Once
Explicity mentions both possibilities, "asserted" and "unasserted 'suggestions'" 

- RDF‐star for Detailed Provenance in Cooperative Union Cataloguing
 https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Detailed-Provenance-in-Cooperative-Union-Cataloguing
Provenance of statements, of which some are controversial or governed by an access policy

- RDF‐star for Wikidata
  https://github.com/w3c/rdf-ucr/wiki/RDF%E2%80%90star-for-Wikidata
Wikidata is about facts, not about opinions, but it has a ranking system - "(a processing annotation that can have three values "preferred"/"normal"/"deprecated")". If that ranking system would benefit from the choice of either "stated" or "reified" statements is another question.

- RDF star for CIDOC CRM events
  https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-CIDOC-CRM-events
The cultural domain is one of those areas where different viewpoints or theories have to be documented in the same graph without all (or any) of them being considered to be true.

- RDF-star for contextualizing historical assertions
  https://github.com/w3c/rdf-ucr/wiki/RDF-star-for-contextualizing-historical-assertions
Another project from the cultural domain.


Best,
Thomas


[0] https://github.com/w3c/rdf-ucr/wiki
[1] https://gitlab.com/rat10/between-facts-and-knowledge/-/blob/main/Between_Facts_and_Knowledge_1.0.2.pdf
[2] Olaf Hartig: Foundations of RDF* and SPARQL* - An Alternative Approach to Statement-Level Metadata in RDF, June 2017, http://olafhartig.de/files/Hartig_AMW2017_RDFStar.pdf
Received on Wednesday, 7 August 2024 17:28:54 UTC