Occurrences as Named Triples from Niklas Lindström on 2024-01-06 (public-rdf-star-wg@w3.org from January 2024)

From: Niklas Lindström <lindstream@gmail.com>
Date: Sat, 6 Jan 2024 20:03:38 +0100
To: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-ID: <CADjV5jfH=r0E1KefW14k5+8yxft_f0dER4G-p40Qd6=pMPFc2A@mail.gmail.com>
Given recent discussions (in emails and during the last Semantic TF
meeting), I've adjusted my position, and come to some tentative
conclusions.

1. We haven't fully clarified what we (need to) mean by occurrence. I
think most(?) expected it to be "an occurrence of one specific
triple".

2. The notion of truth-makers [1] is a valuable input, at least from a
philosophical point of view. It also appears too advanced, or
ambitious, and while it appeals to my "intuitions" about what we're
talking about, I think we need something simpler for triples. It does
however, to me, further affirm that occurrences and not "types" are
key.

3. The RDFn proposal is in a clearer state now [2]. It has always [3]
been about *naming triples*, and from my point of view, its current
design, with a functional requirement on the name to the triple,
solves the problem of mapping the name to the triple.

4. This is different from triple terms in the abstract syntax, and
avoids special properties. The named, unasserted triple is a new
concept, so it does add to RDF concepts and abstract syntax, but
differently (I think it is simpler). Whether N-triples can now contain
"quads" and N-quads become quins, or if we keep more of the "shape" of
triple terms, as in the triple occurrence consolidation, needs to be
further thought through. (It does feel more like a typographical
concern than a conceptual one though. The concepts and abstract syntax
would now have named triples added, and that's it.)

5. RDF-star syntax can remain largely as is, with the recent addition
of triple occurrence names (implicit or explicit).

6. These are named triples (unasserted but described in a graph), and
not named graphs. There is a likeness in that just as you cannot use
the same name for two different graphs in a dataset, you cannot use
the same name for two different triples in a graph. (But in different
graphs, presumably.) The naming will presumably be fully semantically
defined though, and named triples are unequivocally unasserted, which
should further clarify the distinctions.

7. Named triples are reasonably *just* better reification. (Both
unstarring and "starring" would be very simple.)

Example (with only implicit names):

    <book> :datePublished "2023" {| ex:impliedBy <#bp23> |} ;
      :publisher <X> {| ex:impliedBy <#bp23> |} .

    <#bp23> a :PublicationEvent ;
        :location <London> .

    << <book> :datePublished "2023" >>
      ex:source <bookedition#titlepage> ;
      ex:comment "The edition statement says New York."@en .

In N-Triples with named triples (here, the name comes *first* for
unasserted triples, followed by a `|`):

    <book> <sdo/datePublished> "2023" .
    _:b1 | <book> <sdo/datePublished> "2023" .
    _:b1 <ex#impliedBy> <#bp23> .
    <book> <sdo/publisher> <X> .
    _:b2 | <book> <sdo/publisher> <X> .
    _:b2 <ex#impliedBy> <#bp23> .

    <#bp23> <rdf#type> <sdo/PublicationEvent> .
    <#bp23> <sdo/location> <London> .

    _:b3 | <book> <sdo/datePublished> "2023" .
    _:b3 <ex#source> <bookedition#titlepage> .
    _:b3 <ex#comment> "The edition statement says New York."@en .

(Note: Whatever form, we need to allow it in Turtle too, in order to
keep N-Triples as a strict subset thereof. Another naming operator
like `:=` might be clearer.)

In the same RDF source, this MUST be an error (_:b1 is only allowed to
name one triple):

    _:b1 | <book> <sdo/datePublished> "2023" .
    _:b1 | <book> <sdo/publisher> <X> .

(Should this stop processing, or warn and ignore "renames"? I propose
full stop upon interpretation.)

Suggestion: Call the unasserted, named triples *claims*. Calling them
edges (or arcs) could be confusing, since they are, from an RDF point
of view, "hypothetical", and not partaking in "forming" the graph, as
asserted triples are. The rdf:type could be rdf:Claim (rdfs:subClassOf
rdf:Statement?). Optionally, rdfs:Fact could be defined as a *user
hint* that this claim is also asserted (possibly added by the
annotation syntax), or that any so typed claim in the graph is
informally "accepted". It would not have any formal semantic meaning.

Regarding opacity: Since claims do not "partake in the graph", they
are obviously not reasoned over (further setting sets of claims apart
from named graphs, I think). If they would be opaque (in that an OWL
regime would not take their terms into account), we've foreseen
problems in some use cases (e.g. not being able to find annotations on
statements with the same meaning; and more tricky things in the formal
semantics). If they are transparent, I assume that this:

    <X> owl:sameAs <Y> .
    _:b1 | <book> <sdo/publisher> <X> .
    _:b1 <ex#impliedBy> <#bp23> .

might (under some regime) entail:

    _:b4 | <book> <sdo/publisher> <Y> .
    _:b4 owl:sameAs _:b1 .  # ?
    _:b4 <ex#impliedBy> <#bp23> .

If so, it appears that the two names denote the same resource; but it
is different from the occurrence "itself"? Needs more thought. Note
that each named triple occurrence must still have a unique name. So
this would NOT be entailed (and it would be an error):

    _:b1 | <book> <sdo/publisher> <Y> .

I believe that, while depending upon implementation particulars, you
can still query over the inferred graph using SPARQL with entailment
"off", and pick out the "raw data facts". I'm not sure if that's
enough of a "keeping the cake and eating it" for precise data
provenance though.

Thoughts?

Best regards,
Niklas

[1]: <https://plato.stanford.edu/entries/truthmakers/>
[2]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2024Jan/att-0008/RDFn_Abstract_Syntax_and_Concepts.pdf>
[3]: <https://blogs.oracle.com/oraclespatial/post/rdfn-extending-rdf-to-support-named-triples>
Received on Saturday, 6 January 2024 19:04:12 UTC