- From: Niklas Lindström <lindstream@gmail.com>
- Date: Tue, 24 Oct 2023 23:47:38 +0200
- To: RDF-star Working Group <public-rdf-star-wg@w3.org>
Dear all, We must certainly consider our charter and scope. The charter is about: > the ability to concisely represent and query statements about statements. This is not a new concept, having been with us since RDF 1.0 in 1999 (when Ora and Ralph were editors). See section 4, "Statements about Statements" [1]. Notably, that deprecated (and removed) use of rdf:bagID/rdf:aboutEach appears to me as a clear precursor to what became named graphs. Also, in scope: > The scope of this Working Group is to extend the recommendations defining RDF 1.1 and SPARQL 1.1 with the features introduced by RDF-star. More precisely, RDF-star introduces the notion of quoted triple to express statements about statements. The abstract and concrete syntaxes of RDF and SPARQL are extended to support this new feature, as well as their respective semantics. And the RDF-star syntax, especially for annotations, appears useful for a concise representation of our use cases. But "the notion of a quoted triple" does not necessitate the introduction of a new term, does it? (If it does, then a rechartering may indeed be asked for...) I wonder if this: > The Working Group may however reconsider this and proceed differently from the Community Group's proposal. and: > The Working Group will also consider allowing new features in these recommendations, according to Section 6.3.11.4 of the W3C process, in order to render future evolutions easier. does permit for what we're exploring. We must not add unwarranted complexity, of course. But I'm trying to not even add the complexity that a new, separate kind of term introduces. I must stress that I do not think graph terms are better than triple terms, in one crucial way. They both express the same problem of talking about something abstract in the data model ("stepping out of the picture", if you will). With named graphs, where the name is the "token" of an occurrence, you can talk about that. (As can you with old style reification, of course.) Note: *talking* about an occurrence requires "reifying" it. This is why saying "occurrence" isn't precise enough, formally. In fact, even RDF 1.2 Concepts state: > When an RDF triple is used as the subject or object of a triple, this occurrence of the triple is called a quoted triple. Whereas RDF 1.1 Semantics is clear that a reification is a token: > Rather, the reification describes the relationship between a token of a triple and the resources that the triple refers to. (For those academic definitions Thomas alluded to, see e.g. [2].) And thus I think *talking about* occurrences of triples and graphs (making statements about statements) requires reifying them. (We *use* them all the time, that's just RDF.) And reifying graphs is what named graphs have been doing in practice all along. (More or less indirectly, as these "tokens" of mathematical graphs *are* indirect. Pierce may have called them "indices", or perhaps any one of the "indexical signs"; I'm not sure.) Thus I'm not convinced that *any* new terms are needed. That is, I wonder if any of our collected use cases actually require triple terms? I see none that speak of the abstract triple itself, but of some occurrence (and that is a token, since it reifies it into a resource). This is further supported by RDF 1.1 Semantics: > The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object. This supports use cases where properties such as dates of composition or provenance information are applied to the reified triple, which are meaningful only when thought of as referring to a particular instance or token of a triple. I am following what Pat Hayes has said before, probably many times, e.g. in [3]: > It is quite sensible to have two RDF graphs (tokens) with different names which are the same RDF (abstract) graph. That is, two graph tokens which look like (i.e., when poked emit representations of) the same RDF abstract graph. This has always been an issue for the idea of 'named graphs': how can a name be attached to a particular RDF **abstract** graph (as opposed to some document or representation of that abstract graph)? This is almost what named graphs became, but spelled out more explicitly, clearly specified in that the <name, graph> pair is here declared as a *token* of its mathematical graph. (Again, and this may be contentious, I think this token, which is denoted by this name, can be many kinds of resources: words on paper, a chunk of claims gleaned from a web page, an observed phenomenon, the beliefs of Lois Lane, a statement... Those are *indirect* tokens of the graph, paired with the graph to make it representable and queryable as RDF.) The above quote continued with, and this is crucial: > And OK, the answer is: it can't, and this does not matter, because all we are ever needing to identify are graph tokens, not abstract graphs. You name a graph by identifying a token of it. But that only gives you power over the token, not over the abstraction itself. RDF graphs are sets of triples, yet quoted triples in RDF-star are something separate from named graphs, and defined as abstract terms, adding to the abstract syntax. There is a correlation in RDF 1.2 concepts, but it does set a quoted triple (soon to be triple token) apart from a triple in a graph, and does not explain the similar relation to a named graph. This could be further defined in 1.3 of course, but will the notions of unasserted (and opaque) converge or stay separate? We're trying, I think, to figure that out (as we should). Of course, if we are prepared to *redefine* RDF graphs (which are immutable abstract sets) to be built up from triple terms, and explicitly link named graphs to these, that's a possible path. I'm not sure of it, but I can see it. But even so, I think abstract terms, triple terms or sets of those, reasonably denoting *themselves* just as literals do, must be handled with care. That could mean only allowing them in the object position. (And yes, that also means I don't think allowing literals as subjects, to avoid repeating long literals, would be worth the misuse it might lead to.) And even if available for use, triple terms (being abstract mathematical "types") are also different from (and unrelated to) reification tokens, the latter which do provide some fundamentals for what Wikidata and LPGs do. It could lead to endless debate of what is correct where and when. (I think "all we are ever needing to identify are graph tokens, not abstract graphs", goes for triples as well, as is echoed by the quote from RDF 1.1 Semantics above: "use cases where properties such as dates of composition or provenance information [...] which are meaningful only when thought of as referring to a particular instance or token of a triple.") What I think is doable is to correlate named graph occurrences with reification of triples, and we may even elegantly do what was done in the Named Graphs, 2005 paper [4], and declare that a singleton edge (a graph as a singleton set with one triple) entails a reified triple token (an rdf:Statement). Parts of those semantics combined with RDF-star *syntax* (or parts of, or variants thereof) can provide a unified foundation. This can handle unasserted triples and eventual needs for opacity in a uniform way, using conditionally accepted graphs. And some such graph occurrences, namely singleton edges, identify with their reifications, being one or more identical triple tokens (from different sources, dates, real events, beliefs of). That caters for the LPG and Wikidata cases, along with the granular UniProt attribution (which may almost transparently upgrade from reification), CIDOC-CRM events (including interrelated statement tokens), and with detailed provenance and miscellaneous marginalia in libraries. Since a triple token is not in a graph (a set), but described in one, this also doesn't contradict [5]: > Within the framework of Zermelo–Fraenkel set theory, the axiom of regularity guarantees that no set is an element of itself. This implies that a singleton is necessarily distinct from the element it contains, thus 1 and {1} are not the same thing. Which I still fear equating a singleton graph term with its triple term would do. In summary, this way of staying on the reified occurrence side of things doesn't risk users tripping up on the logical foundations. All the best, Niklas [1]: https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/#higherorder [2]: https://iep.utm.edu/demonstratives-and-indexicals/#SH1c [3]: https://lists.w3.org/Archives/Public/public-rdf-wg/2011Feb/0060.html [4]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3199260 [5]: https://en.wikipedia.org/wiki/Singleton_(mathematics)
Received on Tuesday, 24 October 2023 21:48:12 UTC