Staying in scope, talking about occurrences from Niklas Lindström on 2023-10-24 (public-rdf-star-wg@w3.org from October 2023)

From: Niklas Lindström <lindstream@gmail.com>
Date: Tue, 24 Oct 2023 23:47:38 +0200
To: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-ID: <CADjV5jeK6ZCt0BmuL6EcDOq+3ia_52ANSUkBU=jJE2BzEuX16Q@mail.gmail.com>

Dear all,

We must certainly consider our charter and scope. The charter is about:

> the ability to concisely represent and query statements about statements.

This is not a new concept, having been with us since RDF 1.0 in 1999
(when Ora and Ralph were editors). See section 4, "Statements about
Statements" [1]. Notably, that deprecated (and removed) use of
rdf:bagID/rdf:aboutEach appears to me as a clear precursor to what
became named graphs.

Also, in scope:

> The scope of this Working Group is to extend the recommendations defining RDF 1.1 and SPARQL 1.1 with the features introduced by RDF-star. More precisely, RDF-star introduces the notion of quoted triple to express statements about statements. The abstract and concrete syntaxes of RDF and SPARQL are extended to support this new feature, as well as their respective semantics.

And the RDF-star syntax, especially for annotations, appears useful
for a concise representation of our use cases. But "the notion of a
quoted triple" does not necessitate the introduction of a new term,
does it? (If it does, then a rechartering may indeed be asked for...)

I wonder if this:

> The Working Group may however reconsider this and proceed differently from the Community Group's proposal.

and:

> The Working Group will also consider allowing new features in these recommendations, according to Section 6.3.11.4 of the W3C process, in order to render future evolutions easier.

does permit for what we're exploring. We must not add unwarranted
complexity, of course. But I'm trying to not even add the complexity
that a new, separate kind of term introduces.

I must stress that I do not think graph terms are better than triple
terms, in one crucial way. They both express the same problem of
talking about something abstract in the data model ("stepping out of
the picture", if you will). With named graphs, where the name is the
"token" of an occurrence, you can talk about that. (As can you with
old style reification, of course.)

Note: *talking* about an occurrence requires "reifying" it. This is
why saying "occurrence" isn't precise enough, formally. In fact, even
RDF 1.2 Concepts state:

> When an RDF triple is used as the subject or object of a triple, this occurrence of the triple is called a quoted triple.

Whereas RDF 1.1 Semantics is clear that a reification is a token:

> Rather, the reification describes the relationship between a token of a triple and the resources that the triple refers to.

(For those academic definitions Thomas alluded to, see e.g. [2].)

And thus I think *talking about* occurrences of triples and graphs
(making statements about statements) requires reifying them. (We *use*
them all the time, that's just RDF.) And reifying graphs is what named
graphs have been doing in practice all along. (More or less
indirectly, as these "tokens" of mathematical graphs *are* indirect.
Pierce may have called them "indices", or perhaps any one of the
"indexical signs"; I'm not sure.)

Thus I'm not convinced that *any* new terms are needed. That is, I
wonder if any of our collected use cases actually require triple
terms? I see none that speak of the abstract triple itself, but of
some occurrence (and that is a token, since it reifies it into a
resource).

This is further supported by RDF 1.1 Semantics:

> The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object. This supports use cases where properties such as dates of composition or provenance information are applied to the reified triple, which are meaningful only when thought of as referring to a particular instance or token of a triple.

I am following what Pat Hayes has said before, probably many times, e.g. in [3]:

> It is quite sensible to have two RDF graphs (tokens) with different names which are the same RDF (abstract) graph. That is, two graph tokens which look like (i.e., when poked emit representations of) the same RDF abstract graph. This has always been an issue for the idea of 'named graphs': how can a name be attached to a particular RDF **abstract** graph (as opposed to some document or representation of that abstract graph)?

This is almost what named graphs became, but spelled out more
explicitly, clearly specified in that the <name, graph> pair is here
declared as a *token* of its mathematical graph. (Again, and this may
be contentious, I think this token, which is denoted by this name, can
be many kinds of resources: words on paper, a chunk of claims gleaned
from a web page, an observed phenomenon, the beliefs of Lois Lane, a
statement... Those are *indirect* tokens of the graph, paired with the
graph to make it representable and queryable as RDF.)

The above quote continued with, and this is crucial:

> And OK, the answer is: it can't, and this does not matter, because all we are ever needing to identify are graph tokens, not abstract graphs. You name a graph by identifying a token of it. But that only gives you power over the token, not over the abstraction itself.

RDF graphs are sets of triples, yet quoted triples in RDF-star are
something separate from named graphs, and defined as abstract terms,
adding to the abstract syntax. There is a correlation in RDF 1.2
concepts, but it does set a quoted triple (soon to be triple token)
apart from a triple in a graph, and does not explain the similar
relation to a named graph. This could be further defined in 1.3 of
course, but will the notions of unasserted (and opaque) converge or
stay separate? We're trying, I think, to figure that out (as we
should).

Of course, if we are prepared to *redefine* RDF graphs (which are
immutable abstract sets) to be built up from triple terms, and
explicitly link named graphs to these, that's a possible path. I'm not
sure of it, but I can see it. But even so, I think abstract terms,
triple terms or sets of those, reasonably denoting *themselves* just
as literals do, must be handled with care. That could mean only
allowing them in the object position. (And yes, that also means I
don't think allowing literals as subjects, to avoid repeating long
literals, would be worth the misuse it might lead to.)

And even if available for use, triple terms (being abstract
mathematical "types") are also different from (and unrelated to)
reification tokens, the latter which do provide some fundamentals for
what Wikidata and LPGs do. It could lead to endless debate of what is
correct where and when. (I think "all we are ever needing to identify
are graph tokens, not abstract graphs", goes for triples as well, as
is echoed by the quote from RDF 1.1 Semantics above: "use cases where
properties such as dates of composition or provenance information
[...] which are meaningful only when thought of as referring to a
particular instance or token of a triple.")

What I think is doable is to correlate named graph occurrences with
reification of triples, and we may even elegantly do what was done in
the Named Graphs, 2005 paper [4], and declare that a singleton edge (a
graph as a singleton set with one triple) entails a reified triple
token (an rdf:Statement). Parts of those semantics combined with
RDF-star *syntax* (or parts of, or variants thereof) can provide a
unified foundation.

This can handle unasserted triples and eventual needs for opacity in a
uniform way, using conditionally accepted graphs. And some such graph
occurrences, namely singleton edges, identify with their reifications,
being one or more identical triple tokens (from different sources,
dates, real events, beliefs of). That caters for the LPG and Wikidata
cases, along with the granular UniProt attribution (which may almost
transparently upgrade from reification), CIDOC-CRM events (including
interrelated statement tokens), and with detailed provenance and
miscellaneous marginalia in libraries.

Since a triple token is not in a graph (a set), but described in one,
this also doesn't contradict [5]:

> Within the framework of Zermelo–Fraenkel set theory, the axiom of regularity guarantees that no set is an element of itself. This implies that a singleton is necessarily distinct from the element it contains, thus 1 and {1} are not the same thing.

Which I still fear equating a singleton graph term with its triple
term would do.

In summary, this way of staying on the reified occurrence side of
things doesn't risk users tripping up on the logical foundations.

All the best,
Niklas

[1]: https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/#higherorder
[2]: https://iep.utm.edu/demonstratives-and-indexicals/#SH1c
[3]: https://lists.w3.org/Archives/Public/public-rdf-wg/2011Feb/0060.html
[4]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3199260
[5]: https://en.wikipedia.org/wiki/Singleton_(mathematics)

Received on Tuesday, 24 October 2023 21:48:12 UTC