Re: Staying in scope, talking about occurrences from Thomas Lörtsch on 2023-10-25 (public-rdf-star-wg@w3.org from October 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Wed, 25 Oct 2023 20:20:50 +0200
To: Niklas Lindström <lindstream@gmail.com>
Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-Id: <C1FE3558-1DEC-4C88-A23F-114FAC35E552@rat.io>
> On 24. Oct 2023, at 23:47, Niklas Lindström <lindstream@gmail.com> wrote:
> 
> Dear all,
> 
> We must certainly consider our charter and scope. The charter is about:
> 
>> the ability to concisely represent and query statements about statements.
> 
> This is not a new concept, having been with us since RDF 1.0 in 1999
> (when Ora and Ralph were editors). See section 4, "Statements about
> Statements" [1]. Notably, that deprecated (and removed) use of
> rdf:bagID/rdf:aboutEach appears to me as a clear precursor to what
> became named graphs.
> 
> Also, in scope:
> 
>> The scope of this Working Group is to extend the recommendations defining RDF 1.1 and SPARQL 1.1 with the features introduced by RDF-star. More precisely, RDF-star introduces the notion of quoted triple to express statements about statements. The abstract and concrete syntaxes of RDF and SPARQL are extended to support this new feature, as well as their respective semantics.
> 
> And the RDF-star syntax, especially for annotations, appears useful
> for a concise representation of our use cases. But "the notion of a
> quoted triple" does not necessitate the introduction of a new term,
> does it? (If it does, then a rechartering may indeed be asked for...)
> 
> I wonder if this:
> 
>> The Working Group may however reconsider this and proceed differently from the Community Group's proposal.
> 
> and:
> 
>> The Working Group will also consider allowing new features in these recommendations, according to Section 6.3.11.4 of the W3C process, in order to render future evolutions easier.
> 
> does permit for what we're exploring. We must not add unwarranted
> complexity, of course. But I'm trying to not even add the complexity
> that a new, separate kind of term introduces.

Some kind of newness will be required to facilitate referentially opaque and unasserted trems/graphs/types. That not only enables slightly exotic Explainable AI applications, but also the quite popular "unasserted assertions" use case (although the latter is better served with referential transparency IMO). If we ignore those however, I agree with you. And that IMO illustrates nicely where the WG already diverges from the charter: not by considering graphs which should make authoring a lot easier and querying a lot more predictable, but by pursuing some very specific semantics.

> I must stress that I do not think graph terms are better than triple
> terms, in one crucial way. They both express the same problem of
> talking about something abstract in the data model ("stepping out of
> the picture", if you will). With named graphs, where the name is the
> "token" of an occurrence, you can talk about that. (As can you with
> old style reification, of course.)

Those mechanisms are different, see below.

> Note: *talking* about an occurrence requires "reifying" it. This is
> why saying "occurrence" isn't precise enough, formally. In fact, even
> RDF 1.2 Concepts state:
> 
>> When an RDF triple is used as the subject or object of a triple, this occurrence of the triple is called a quoted triple.
> 
> Whereas RDF 1.1 Semantics is clear that a reification is a token:
> 
>> Rather, the reification describes the relationship between a token of a triple and the resources that the triple refers to.

It is clear about that, but the choice is deliberate. It could also have chosen to define it as refering to the type, but the use for that was not seen (and still isn’t by most people, as the use cases illustrate)

> (For those academic definitions Thomas alluded to, see e.g. [2].)
> 
> And thus I think *talking about* occurrences of triples and graphs
> (making statements about statements) requires reifying them. (We *use*
> them all the time, that's just RDF.) And reifying graphs is what named
> graphs have been doing in practice all along. (More or less
> indirectly, as these "tokens" of mathematical graphs *are* indirect.
> Pierce may have called them "indices", or perhaps any one of the
> "indexical signs"; I'm not sure.)

I tend to disagree. IIUC a reification really creates an entity of its own, one that refers to a statement (or graph) but neither asserts nor endorses it, but just describes it, and then does something with that reification, eg annotates it. So the relation is indirect and this is what bothers me about reification:
A) it cuts the relation between a statement and "its" annotations
B) this cut makes it not only hard to provide eg provenance information for a statement token but even harder to add qualifying detail - as it is often done in Property Graphs

This cut in RDF reification is deliberate, intended to prevent non-monotonicity, but I think that is over-cautious and is costing us much too much in terms of expressivity. IMO any annotation can be considered as adding detail, and fair game, as long as it does not claim the annotated statement to be outright false.

IMO the level of "aboutness" boils down to very application dependent intuitions. E.g. most often provenance is understood as administrative detail, external to the annotated fact itself. In court however provenance - the source of an assertion - may be of utmost importance. One thing I like very much about the 1999 RDF specifation is that it so explicitly calls teh difference between data and metadata an arbitrary one, application dependent and not suitable to guide the design of RDF. In that view every annotation is just more detail. The question to answer is then: which annotation applies to the statement as a whole, which to the relation, which to individual nodes. That’s why the Nested Graphs proposal has those fragment identifiers.
 
> Thus I'm not convinced that *any* new terms are needed. That is, I
> wonder if any of our collected use cases actually require triple
> terms? I see none that speak of the abstract triple itself, but of
> some occurrence (and that is a token, since it reifies it into a
> resource).
> 
> This is further supported by RDF 1.1 Semantics:
> 
>> The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object. This supports use cases where properties such as dates of composition or provenance information are applied to the reified triple, which are meaningful only when thought of as referring to a particular instance or token of a triple.
> 
> I am following what Pat Hayes has said before, probably many times, e.g. in [3]:
> 
>> It is quite sensible to have two RDF graphs (tokens) with different names which are the same RDF (abstract) graph. That is, two graph tokens which look like (i.e., when poked emit representations of) the same RDF abstract graph. This has always been an issue for the idea of 'named graphs': how can a name be attached to a particular RDF **abstract** graph (as opposed to some document or representation of that abstract graph)?

Thanks a lot for that citation!

> This is almost what named graphs became, but spelled out more
> explicitly, clearly specified in that the <name, graph> pair is here
> declared as a *token* of its mathematical graph. (Again, and this may
> be contentious, I think this token, which is denoted by this name, can
> be many kinds of resources: words on paper, a chunk of claims gleaned
> from a web page, an observed phenomenon, the beliefs of Lois Lane, a
> statement... Those are *indirect* tokens of the graph, paired with the
> graph to make it representable and queryable as RDF.)
> 
> The above quote continued with, and this is crucial:
> 
>> And OK, the answer is: it can't, and this does not matter, because all we are ever needing to identify are graph tokens, not abstract graphs. You name a graph by identifying a token of it. But that only gives you power over the token, not over the abstraction itself.
> 
> RDF graphs are sets of triples, yet quoted triples in RDF-star are
> something separate from named graphs, and defined as abstract terms,
> adding to the abstract syntax. There is a correlation in RDF 1.2
> concepts, but it does set a quoted triple (soon to be triple token)
> apart from a triple in a graph, and does not explain the similar
> relation to a named graph. This could be further defined in 1.3 of
> course, but will the notions of unasserted (and opaque) converge or
> stay separate? We're trying, I think, to figure that out (as we
> should).
> 
> Of course, if we are prepared to *redefine* RDF graphs (which are
> immutable abstract sets) to be built up from triple terms, and
> explicitly link named graphs to these, that's a possible path. I'm not
> sure of it, but I can see it. But even so, I think abstract terms,
> triple terms or sets of those, reasonably denoting *themselves* just
> as literals do, must be handled with care. That could mean only
> allowing them in the object position. (And yes, that also means I
> don't think allowing literals as subjects, to avoid repeating long
> literals, would be worth the misuse it might lead to.)

I don’t know what makes you graph types like raw eggs. Encoded as literals they are the most robust and unambiguous entity I can think of.

> And even if available for use, triple terms (beingabstract
> mathematical "types") are also different from (and unrelated to)
> reification tokens, the latter which do provide some fundamentals for
> what Wikidata and LPGs do. It could lead to endless debate of what is
> correct where and when. (I think "all we are ever needing to identify
> are graph tokens, not abstract graphs", goes for triples as well, as
> is echoed by the quote from RDF 1.1 Semantics above: "use cases where
> properties such as dates of composition or provenance information
> [...] which are meaningful only when thought of as referring to a
> particular instance or token of a triple.")
> 
> What I think is doable is to correlate named graph occurrences with
> reification of triples, and we may even elegantly do what was done in
> the Named Graphs, 2005 paper [4], and declare that a singleton edge (a
> graph as a singleton set with one triple) entails a reified triple
> token (an rdf:Statement). Parts of those semantics combined with
> RDF-star *syntax* (or parts of, or variants thereof) can provide a
> unified foundation.

IIUC Named Graphs per the 2005 paper are referentially opaque. That I don’t find useful. The way the denotation of their name is defined however I find very useful.

> This can handle unasserted triples and eventual needs for opacity in a
> uniform way, using conditionally accepted graphs. And some such graph
> occurrences, namely singleton edges, identify with their reifications,
> being one or more identical triple tokens (from different sources,
> dates, real events, beliefs of). That caters for the LPG and Wikidata
> cases, along with the granular UniProt attribution (which may almost
> transparently upgrade from reification), CIDOC-CRM events (including
> interrelated statement tokens), and with detailed provenance and
> miscellaneous marginalia in libraries.
> 
> Since a triple token is not in a graph (a set), but described in one,
> this also doesn't contradict [5]:
> 
>> Within the framework of Zermelo–Fraenkel set theory, the axiom of regularity guarantees that no set is an element of itself. This implies that a singleton is necessarily distinct from the element it contains, thus 1 and {1} are not the same thing.
> 
> Which I still fear equating a singleton graph term with its triple
> term would do.

Yep.

> In summary, this way of staying on the reified occurrence side of
> things doesn't risk users tripping up on the logical foundations.

As I said above, reification IMO is not the right way to go. Qualification can only be done meaningfully on the triple itself. It has been done before with Singleton Properties, and IMO it is semantically sound.

Best,
Thomas

> All the best,
> Niklas
> 
> [1]: https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/#higherorder
> [2]: https://iep.utm.edu/demonstratives-and-indexicals/#SH1c
> [3]: https://lists.w3.org/Archives/Public/public-rdf-wg/2011Feb/0060.html
> [4]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3199260
> [5]: https://en.wikipedia.org/wiki/Singleton_(mathematics)
>
Received on Wednesday, 25 October 2023 18:21:07 UTC