Re: Basic machinery for annotations

> On 15. Aug 2024, at 13:41, Andy Seaborne <andy@apache.org> wrote:
> 
> Most data is "light weight modelling",

According to my thesaurus "light weight" may also mean "insubstantial".

> enough to get the job done, and I believe a common usage will be reification with no opinion as to "stated"/"describes".

Most data comes with some assumption. I may of course be wrong about the use cases and everything I know about the semantic web, but my assessment is that most use cases assume that the data to be annotated is true in the graph. And to some others it is important that the data they annotate is NOT true in the graph. And some get by with the sheer fact that data that is not in the graph is not true in the graph. Only the last ones are well served by the current design.

> The way to provide distinctions and richness in RDF is through modelling to provide additional details.
> 
> We already have basic machinery that does not involve inference, or integrity constraints (data maintenance), or mandated cardinality restrictions.
> 
> This should be the abstract data model of RDF.
> 
> It works for all RDF syntaxes and all ways of writing out RDF. Generated RDF does not often use the full richness of syntax shortcuts.
> 
> --
>  :s :p :o {| a rdf:Stated |}

This requires any use case that assumes that the data it annotates is true in the graph - i.e. the probably vast majority - to explicitly say so. This is the opposite of "lightweight modelling", it is "cumbersome modelling". Not enough people will do that for it to be reliable. So we will be back to speculation.

> -->
>  :s :p :o .
>  _:b rdf:reifies <<(:s :p :o )>> .
>  _:b rdf:Stated .

   ( ^ missing an 'a' here I assume )
> —

Also the annotation syntax doesn’t roundtrip reliably

    IN
    << :s :p :o >> :src :A . 
    :s :p :o {| :src :B |} .

    OUT
    :s :p :o {| :src :A |} . 
    :s :p :o {| :src :B |} . 

There goes support for unasserted statements.

The easiest fix to this is to map the annotation syntax to a second property "rdfs:states". This also disambiguates between mere reification vs annotation of a stated triple, thereby making an extra annotation like the above " a rdf:Stated" superfluous. 

> 
> A different name may be better - this is only for illustration - but that's part of the point.
> 
> It requires nothing more of an RDF 1.2 or SPARQL 1.2 system than the baseline - triple terms (and, elsewhere, initial text direction) - and possibly some vocabulary.

possibly -> probably

> It does not involve modifying the relationship between SPARQL BGP matching and simple entailment.

it seems to me that, if done right, neither does rdf:states

> More sophisticated and focused systems can be written on top of this base by giving their own well-formedness conditions. c.f. RDF lists. They will emerge as needed and we can't prejudge

What do we have the collection of use cases for? They "emerged" long ago and they haven’t changed much in the meantime, no "prejudging" involved.

> them.

I’ve actually been thinking about the RDF list vocabulary too. It does make assumptions about well-formedness without providing any means to enforce those (in simple RDF). Those assumptions reflect what users expect from a proper list. We could (and IMO should) do exactly the same for 'rdfs:states', i.e. express the assumption that a stated term is true in the graph, which b.t.w. is what the original RDF* proposal assumed as well (see my recent exchange with Olaf [0]). Note that RDF* is in general considered very light weight.

We could even go one step further towrds LPG-interop and express the assumption that 'rdfs:states' is many-to-one, i.e. if some identifier(s) rdf:states multiple triple terms then it can be assumed that the respective subjects, predicates and objects in those referenced triple terms co-denote. 

An RDF-star entailment regime may enforce these constraints, but that’s not a requirement for them to be useful. A syntax like the RDF-star annotation syntax makes them foolproof to author, a respective arrangement in SPARQL (maybe based on that defined in SPARQL* [1]) is needed as well. Taken together this meets the demand of statement qualification in general and LPG-iteroperability in particular pretty well. 


I’m still pondering alternatives. Of course we should meet demands with the most minimal set of primitives possible, but demands differ by a wide margin, so how about this:

- the abstract syntax allows triple terms only in object position

    object :== iri | blankNode | literal | tripleTerm

- no additional  "abstract syntax of well-formed RDF"
  instead three properties to refer to triple terms
  each with precisely defined semantics

    :id rdf:reifies|rdfs:states|rdf:quotes <<( :s :p :o )>> .

— rdf:reifies 
    safely within RDF standard semantics
    follows the current baseline proposal
    syntactic sugar for RDF standard reification, but also
    can support many use cases if additional detail is provided
    many-to-many

    << :s :p :o ~ :id >> a :annotation .
    
- rdf:states
    meets the needs of LPG interoperability
    meets the popular intuition of qualifying a statement
    makes semantics assumptions, that only a higher level entailment regime can enforce
    many-to-one

    :s :p :o ~:id {| a :annotation |} .

- rdf:quotes
    referentially opaque
    a safe extension of RDF standard semantics, RDF-star CG report
    meets the needs of versioning, explainable AI
    safey separated from standard use cases
    many-to-many
    either no syntactic sugar
        :id rdf:quotes <<( :s :p :o )>> .
    or something like
        <" :s :p :o ~:id "> a :annotation .


Maybe a way to make everybody happy without causing too much trouble or confusion.


Best,
Thomas


> 
>    Andy
> 

[0] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Aug/0068.html
[1] Olaf Hartig: Foundations of RDF* and SPARQL* - An Alternative Approach to Statement-Level Metadata in RDF, June 2017,
http://olafhartig.de/files/Hartig_AMW2017_RDFStar.pdf

Received on Thursday, 15 August 2024 13:21:51 UTC