- From: Niklas Lindström <lindstream@gmail.com>
- Date: Wed, 1 Nov 2023 01:24:42 +0100
- To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
On Fri, Oct 27, 2023 at 12:07 PM Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote: > > Niklas, > Hi Antoine, Thank you for your feedback! > A few comments about your slides (not necessarily about your proposal): > > First, the good things: RDF reification is indeed under-used, but it is > used. Especially, it has been used in significant datasets like uniprot > when the default syntax for RDF was RDF/XML. RDF/XML has syntactic sugar > for reification, which makes it super easy to write. One reason people > don't like reification is because it is verbose and cumbersome. But RDF > lists are also verbose and cumbersome if written as triples. Yet, with > the right syntax, good practices, and dedicated primitives in > programming, they are well accepted and well supported. The same could > be true with reification. So, yes, "quoted triples" as a way to simplify > the use of standard RDF reification is an option that should be on the > table. But the big problem is that the semantics is not constraining at > all, and people may have completely different practices in the way they > use reification. However, as opposed to named graphs, RDF reification > has a normative semantics, although it is very weak. Yes. I do think there is some value in that kind of weak semantics though, e.g. for "informal, messy, qualification", as I mention in slide 29. It merely correlates a statement (a reified triple token) to a triple type, and the nature (interpretation) of that token can be anything. IMO, the triple substrate of RDF has proven to be sound by staying so simple. Talking about particular occurrences of statements (denoted by IRIs or bnodes) maintains this simplicity. Some can certainly even *mean* the triple itself, though most rather mean the statement in some more real world, most likely as a claim, observation or other "fine-grained" event (with dates, source, etc.). To add triples themselves as components of triples changes this radically. These new terms are not "atoms" like the others, but recursive, abstract types. A bit like graph terms in N3, but different, more like disjoint fragments thereof. So instead of triples being the simple substrate, these new terms give it a recursive structure, which, while perhaps mathematically sound, allows for new kinds of creative structures I cannot really see motivations for (e.g., as shown, representing an entire graph as one single term). It might be extremely powerful in ways I have not grasped; but adding this to the core of RDF, rather than as an experimental add-on, worries me a lot. Not the least since their restrictive type nature (like the self-denoting literals) make most examples become conflating and must be altered once new information is encountered. That fundamentally complicates the simplicity of RDF, which I fear can harm uptake and harmonization. And it's not that I don't see challenges going the named graph route, I just see the possible result of that as becoming a *clarification* of what today is implementation-dependent, divergent practices for managing those. Had I seen no other route for triple annotations, I would possibly accept this new complexity. But I see an opening with named graphs, and am exploring how far that path goes. > Second, the criticism, in details: > > Slide 6 has the title "RDF 1.1 Concepts" and subtitle "on reification", > but the text you put on the right is from RDF Schema. "Concepts" don't > say anything about reification. Moreover, this text is in a section that > is not normative. Formally, the semantics of reification does not imply > that a reified triple is a token or anything. According to the standard, > one could interpret a reified triple as the triple itself and it would > not violate anything normative. I am terribly sorry, that was a misquote; I meant "RDF 1.1 Semantics", appendix "D.1 Reification". Most definitely non-normative; I could have specified that too. And certainly one could do that, but without more explicit information, it would be a very restrictive interpretation. (Which anyone is allowed to use of course. I've done so myself in certain contexts.) > Then in Slide 7, what is written is Pat Hayes's idea of a named graph. > But as far as the standards are concerned (SPARQL 1.0, SPARQL 1.1, and > RDF 1.1), named graphs are *only* pairs (n,g) and that's all. You may > interpret this as a token of a graph with a name if you want, but again, > this is not normative and there are other ways to interpret it. That is true. And I am not claiming that they are tokens *of graphs*. I am (sort of) siding with the notion that the *pairs* are tokens, and we just don't know of what: either the graph and its name, or the graph and the name of something else. But I think this is something preferably defined differently. See below. > In Slide 24, it is written "A triple is identified with the singleton > set containing it", and a subtitle says "RDF 1.1 Semantics". Clearly, an > element and the singleton that contains it are never the same, but they > may be identified in certain contexts. I do not understand to which > context you refer here. The mention of RDF 1.1 Semantics is misleading > because RDF 1.1 Semantics does not have this identification. In fact, > quite the opposite: if they were identified, then: I mean contexts where a token of a triple and a token of a singleton set of that triple are indistinguishable. (I did mention when I presented that the word "identified" in that section of the semantics *does not* mean "denoted by", but in running through the presentation quickly, I apologize for not making that point clearly, neither in the slides nor the presentation.) I suspect one example would be a depiction of a mathematical graph, where an instance of a singleton edge is such a token? I also assume that section 3.3 on RDF Reification in pages 6-8 of Named Graphs, 2005 [1] (which I partially repeat in slides 25-28), forms such a context? I would be very grateful for some formal verification of that. > { <me> <wears> _:b . _:b a <Hat> } > > would be identified with > > {{<me> <wears> _:b}, {_:b a <Hat>}} > > But these two sets mean different things. The second one does not imply > the first one. First one says "I wear a hat", while second one says "I > wear something. There exists a hat." Of course; presuming that this is not data from the same RDF source, but a dataset of two *opaque* graphs. If they were transparent, that would mean the same hat. (RDF 1.1 Semantics, 10. RDF Datasets: "The graphs in a single dataset may share blank nodes.") > Slide 30: Again, "The <name, graph> pair is a token of its mathematical > graph." is one way of interpreting the pair. Imagine I have a pair (iri, > n), where iri is an IRI and n is a natural number. Would you interpret > this as a token for the mathematical number n? For instance, instead, if > iri is a DOI, n could be the number of times the document was printed. Yes, this was also based on Named Graphs, 2005: "I(name(ng)) = ng. Note that the named graph itself, ..., is the denotation of the name." Which is not at all normative of course; and I don't think all of it works. But parts thereof could be considered for adoption. And of course I do not mean that <name, ?x> pairs in general have any such meaning. > Also, "This token, which is denoted by this name" is your > interpretation. "Denote" is formally defined in RDF 1.1 Semantics: > https://www.w3.org/TR/rdf11-mt/#dfn-denote, so when you use this term in > the context of RDF, it suggests that you talk about what RDF Semantics > says. But RDF Semantics does not say that the graph name denotes > anything in particular. Indeed it doesn't. The Named Graphs paper defined it so. My purpose here was to show that there have been attempts at bridging the difference between reification and named graphs. I think there is something to it, but I know it is too restrictive for how named graphs are used in production today. Also, there's something odd with a pair being a token of one of its members. A simple relation is sufficient (such has been proposed, e.g. sem:quotedGraph in Notation 3). More on that below. > Slide 34: I don't understand or I simply disagree with some of the > statements: "...nested graphs? (...) Requires “graph literals” (...)" -> > I don't see how this follows from that. Not the concept in general, but the explicit proposal, which proposes using graph literals [2] for opacity (referencing e.g. your proposal for that). > "… graph terms? Same problem as for triple terms - > these are abstract mathematical objects denoting > themselves." -> graph terms are just a syntactic structure. This does > not imply anything about what they denote or not. In the recent discussions, type terms have been proposed as a new name for quoted triples, which represent types, not tokens. In the same fashion as graph terms in Notation 3, which are occurrences of the graph itself, as a type. Of course that is an interpretation, and I agree that a term doesn't necessarily mean "as a type", so I should have qualified with "graph terms as types". (Perhaps it's better to say *graph blocks* when referring to the syntactically similar structures in TriG and Notation 3, which have different meanings, e.g. regarding opacity and opened/closed)...) > Additionally, there are parts where it is hard to understand what you > mean. Your spoken words yesterday explained some things but sometimes > even with your verbal presentation, I had trouble figuring out what your > proposal(s) consist(s) in exactly. My proposal is to not add terms as types to the core of RDF, and to focus on either reification, which I don't think can cover all relevant use cases, or to improve named graphs with some (backwards-compatible) semantics and syntactic sugar for making small graphs (mainly but not exclusively singleton sets) simple and efficient to use for annotation purposes. And that the latter option can be made compatible with the first through entailment. Most of this can be drawn from the mentioned Named Graphs paper, but adjusted to the realities of RDF 1.1 as deployed. That appears more productive than adding an entirely new, complex term to the core of RDF. Especially since it comes with its own issues. The aim of the presentation (which I threw together in haste) was to motivate that proposal, arriving at slide 32 with the work to be done. Especially: We need a way to say that a named graph (occurrence) is from or of a graph occurrence (or the default graph occurrence). An appendix of the graph. That’s the missing piece. I've worked some more on details, outlined here: (Use of `rdfx:` and `rdfsx:` prefixes below are for proposed additions to `rdf:` and `rdfs:` respectively.) 1. For named graphs, instead of just the <name,graph> pair, let that pairing imply a relationship (a new `rdfx:graph` property) between any kind of resource (which the name denotes), and an RDF graph. I failed to arrive at that in slide 30, which got stuck on the token. This relationship is abstract, TriG and SPARQL already have syntax and keywords for implying it. 2. As already shown [3], the annotation short forms, or a variant thereof, can be represented as named graphs. So no adjustments to the abstract syntax at all. I've been convinced by Andy that relying on blank graph names for anything won't work. 3. Upon this, management of named graphs in datasets can be more formalized, by more explicit rules for *accepting* them in datasets (see that notion in the Named Graphs paper), particularly meaning that the common practise of making all named graphs visible in a union default graph [4] can be conditioned, controlled. 4. Define that condition on a notion of *graph name ownership*, where if a named graph (again, any kind of resource paired with an RDF graph in a dataset) relates to another with `rdfx:namedBy` (also a new property), then the dataset doesn't own the name, and the owned named graph isn't accepted -- it's "neutral", queryable if named but not in the union default graph. (This replaces, or augments, my previously suggested "blank graphs as appendices"; again no longer relying on syntax, only a relationship. For which syntax can be sugar; specifically if named graphs contain annotations. If a default graph contains them, reading that into a graph store can associate "appendices" with the name it is read "into" (meaning that RDF sources containing datasets can be read as being a graph and its appendices; of course only if so desired by the dataset manager).) 5. (We're really done here for RDF 1.2, but this is for compatibility with reification and as a recipe for legacy systems without "union graph conditional acceptance".) Define an entailment like the one in the Blank Graphs paper, so that named graphs, either just singleton sets, or all of them, entail them as reified statements. This can be done by defining an `rdfsx:implies` relation from the named graph resource to an informal set of reified statements. So the nature of the resources "naming" the graph is open as before. Except only for those "naming" singleton sets, which would "identify with its set", I guess, practically meaning that their `rdfsx:implies` would be self-referential (reflexive), and they would be of type `rdf:Statement` with `rdf:subject`, `rdf:predicate` and `rdf:object` arcs to the corresponding parts of the singleton set it is an `rdfx:graph` of. If this is too tall an order, we *could* leave that work to the next working group (which I'd gladly join), and *just* define a form of annotation syntax to Turtle-star as sugar for assertion-plus-reification (equivalent to @rdf:ID on arcs in RDF/XML). That would be a rather small first step though; the real power would be bridging this and named graphs. But if step 5 is feasible, it's not a step in another direction. Now to my controversial proposed *removal* of triple terms (motivated above, and before). Either A) skip the quoted triple syntax altogether, and add the IMO more convenient quotation dash form (which "comments out" objects, making the arc unasserted). Thas gives full syntactic sugar for all forms of old-school reification, all based on the annotation form (which needs some revision to support repetition, but that's easy if we can decide on syntax (which may be less easy)). That'd also leave that syntax for specialized applications to continue to use (for triple term types), instead of being co-opted for reification. Or B), remake it as sugar for reification. I don't think it's worth it, as it isn't lightyears ahead of explicit reification (e.g. using some prefix tricks), and I see more unity in the quotation dash (as use cases call mainly for asserted+annotated, and the dash turns off the assertion for the odd ones out). Also, even if adding the syntax is considered more important than its semantics (e.g. because some implementations have already deployed it, *despite* it not being standard), one annoyance would be that *if* graph terms would be added in a future RDF, these two forms might be confusing together. And in terms of readability, the *form* of `<< s p o >>` *looks* kind of like an IRI, which is a type term, and not like repeated `[]`, which represent unique tokens per occurrence. That could be even more confusing; but possibly an inconsistency that just takes getting used to. Of course, I think the numbered suggestions I made above are important to consider before going the reification only route. (As asked for by Olaf in last week's telecon, I'm working on a full proposal. That outline is just a quick summary.) Best regards, Niklas [1]: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3199260> [2]: <https://gist.github.com/rat10/eaa109ab56b4d77d29e3a826291f8e72#graph-literals> [3]: <https://gist.github.com/niklasl/c22994e664663b6730613ecc1321c418> [4]: <https://www.w3.org/TR/sparql11-service-description/#sd-uniondefaultgraph> > > --AZ > > Le 26/10/2023 à 20:37, Niklas Lindström a écrit : > > Dear all, > > > > Here are the slides I presented at today's teleconference. > > > > Best regards, > > Niklas > > > > (PS. Escher's Dragon is pixelated to avoid copyright issues.) > > -- > Antoine Zimmermann > École des Mines de Saint-Étienne > 158 cours Fauriel > CS 62362 > 42023 Saint-Étienne Cedex 2 > France > Tél:+33(0)4 77 49 97 02 > http://www.emse.fr/~zimmermann/
Received on Wednesday, 1 November 2023 00:25:15 UTC