Re: Slides: Talking About Occurrences from Niklas Lindström on 2023-11-01 (public-rdf-star-wg@w3.org from November 2023)

From: Niklas Lindström <lindstream@gmail.com>
Date: Wed, 1 Nov 2023 01:24:42 +0100
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-ID: <CADjV5jetnLZOtwjKFmc+EAdUd7sbs9tE0y3GrN+Q1w+hKMo5xQ@mail.gmail.com>
On Fri, Oct 27, 2023 at 12:07 PM Antoine Zimmermann
<antoine.zimmermann@emse.fr> wrote:
>
> Niklas,
>

Hi Antoine,

Thank you for your feedback!

> A few comments about your slides (not necessarily about your proposal):
>
> First, the good things: RDF reification is indeed under-used, but it is
> used. Especially, it has been used in significant datasets like uniprot
> when the default syntax for RDF was RDF/XML. RDF/XML has syntactic sugar
> for reification, which makes it super easy to write. One reason people
> don't like reification is because it is verbose and cumbersome. But RDF
> lists are also verbose and cumbersome if written as triples. Yet, with
> the right syntax, good practices, and dedicated primitives in
> programming, they are well accepted and well supported. The same could
> be true with reification. So, yes, "quoted triples" as a way to simplify
> the use of standard RDF reification is an option that should be on the
> table. But the big problem is that the semantics is not constraining at
> all, and people may have completely different practices in the way they
> use reification. However, as opposed to named graphs, RDF reification
> has a normative semantics, although it is very weak.

Yes. I do think there is some value in that kind of weak semantics
though, e.g. for "informal, messy, qualification", as I mention in
slide 29. It merely correlates a statement (a reified triple token) to
a triple type, and the nature (interpretation) of that token can be
anything.

IMO, the triple substrate of RDF has proven to be sound by staying so
simple. Talking about particular occurrences of statements (denoted by
IRIs or bnodes) maintains this simplicity. Some can certainly even
*mean* the triple itself, though most rather mean the statement in
some more real world, most likely as a claim, observation or other
"fine-grained" event (with dates, source, etc.).

To add triples themselves as components of triples changes this
radically. These new terms are not "atoms" like the others, but
recursive, abstract types. A bit like graph terms in N3, but
different, more like disjoint fragments thereof. So instead of triples
being the simple substrate, these new terms give it a recursive
structure, which, while perhaps mathematically sound, allows for new
kinds of creative structures I cannot really see motivations for
(e.g., as shown, representing an entire graph as one single term). It
might be extremely powerful in ways I have not grasped; but adding
this to the core of RDF, rather than as an experimental add-on,
worries me a lot.

Not the least since their restrictive type nature (like the
self-denoting literals) make most examples become conflating and must
be altered once new information is encountered.

That fundamentally complicates the simplicity of RDF, which I fear can
harm uptake and harmonization. And it's not that I don't see
challenges going the named graph route, I just see the possible result
of that as becoming a *clarification* of what today is
implementation-dependent, divergent practices for managing those. Had
I seen no other route for triple annotations, I would possibly accept
this new complexity. But I see an opening with named graphs, and am
exploring how far that path goes.

> Second, the criticism, in details:
>
> Slide 6 has the title "RDF 1.1 Concepts" and subtitle "on reification",
> but the text you put on the right is from RDF Schema. "Concepts" don't
> say anything about reification. Moreover, this text is in a section that
> is not normative. Formally, the semantics of reification does not imply
> that a reified triple is a token or anything. According to the standard,
> one could interpret a reified triple as the triple itself and it would
> not violate anything normative.

I am terribly sorry, that was a misquote; I meant "RDF 1.1 Semantics",
appendix "D.1 Reification". Most definitely non-normative; I could
have specified that too. And certainly one could do that, but without
more explicit information, it would be a very restrictive
interpretation. (Which anyone is allowed to use of course. I've done
so myself in certain contexts.)

> Then in Slide 7, what is written is Pat Hayes's idea of a named graph.
> But as far as the standards are concerned (SPARQL 1.0, SPARQL 1.1, and
> RDF 1.1), named graphs are *only* pairs (n,g) and that's all. You may
> interpret this as a token of a graph with a name if you want, but again,
> this is not normative and there are other ways to interpret it.

That is true. And I am not claiming that they are tokens *of graphs*.
I am (sort of) siding with the notion that the *pairs* are tokens, and
we just don't know of what: either the graph and its name, or the
graph and the name of something else. But I think this is something
preferably defined differently. See below.

> In Slide 24, it is written "A triple is identified with the singleton
> set containing it", and a subtitle says "RDF 1.1 Semantics". Clearly, an
> element and the singleton that contains it are never the same, but they
> may be identified in certain contexts. I do not understand to which
> context you refer here. The mention of RDF 1.1 Semantics is misleading
> because RDF 1.1 Semantics does not have this identification. In fact,
> quite the opposite: if they were identified, then:

I mean contexts where a token of a triple and a token of a singleton
set of that triple are indistinguishable.

(I did mention when I presented that the word "identified" in that
section of the semantics *does not* mean "denoted by", but in running
through the presentation quickly, I apologize for not making that
point clearly, neither in the slides nor the presentation.)

I suspect one example would be a depiction of a mathematical graph,
where an instance of a singleton edge is such a token? I also assume
that section 3.3 on RDF Reification in pages 6-8 of Named Graphs, 2005
[1] (which I partially repeat in slides 25-28), forms such a context?
I would be very grateful for some formal verification of that.

> { <me> <wears> _:b . _:b a <Hat> }
>
> would be identified with
>
> {{<me> <wears> _:b}, {_:b a <Hat>}}
>
> But these two sets mean different things. The second one does not imply
> the first one. First one says "I wear a hat", while second one says "I
> wear something. There exists a hat."

Of course; presuming that this is not data from the same RDF source,
but a dataset of two *opaque* graphs. If they were transparent, that
would mean the same hat. (RDF 1.1 Semantics, 10. RDF Datasets: "The
graphs in a single dataset may share blank nodes.")

> Slide 30: Again, "The <name, graph> pair is a token of its mathematical
> graph." is one way of interpreting the pair. Imagine I have a pair (iri,
> n), where iri is an IRI and n is a natural number. Would you interpret
> this as a token for the mathematical number n? For instance, instead, if
> iri is a DOI, n could be the number of times the document was printed.

Yes, this was also based on Named Graphs, 2005: "I(name(ng)) = ng.
Note that the named graph itself, ..., is the denotation of the name."
Which is not at all normative of course; and I don't think all of it
works. But parts thereof could be considered for adoption.

And of course I do not mean that <name, ?x> pairs in general have any
such meaning.

> Also, "This token, which is denoted by this name" is your
> interpretation. "Denote" is formally defined in RDF 1.1 Semantics:
> https://www.w3.org/TR/rdf11-mt/#dfn-denote, so when you use this term in
> the context of RDF, it suggests that you talk about what RDF Semantics
> says. But RDF Semantics does not say that the graph name denotes
> anything in particular.

Indeed it doesn't. The Named Graphs paper defined it so. My purpose
here was to show that there have been attempts at bridging the
difference between reification and named graphs.

I think there is something to it, but I know it is too restrictive for
how named graphs are used in production today. Also, there's something
odd with a pair being a token of one of its members. A simple relation
is sufficient (such has been proposed, e.g. sem:quotedGraph in
Notation 3). More on that below.

> Slide 34: I don't understand or I simply disagree with some of the
> statements: "...nested graphs? (...) Requires “graph literals” (...)" ->
> I don't see how this follows from that.

Not the concept in general, but the explicit proposal, which proposes
using graph literals [2] for opacity (referencing e.g. your proposal
for that).

> "… graph terms? Same problem as for triple terms -
> these are abstract mathematical objects denoting
> themselves." -> graph terms are just a syntactic structure. This does
> not imply anything about what they denote or not.

In the recent discussions, type terms have been proposed as a new name
for quoted triples, which represent types, not tokens. In the same
fashion as graph terms in Notation 3, which are occurrences of the
graph itself, as a type. Of course that is an interpretation, and I
agree that a term doesn't necessarily mean "as a type", so I should
have qualified with "graph terms as types".

(Perhaps it's better to say *graph blocks* when referring to the
syntactically similar structures in TriG and Notation 3, which have
different meanings, e.g. regarding opacity and opened/closed)...)

> Additionally, there are parts where it is hard to understand what you
> mean. Your spoken words yesterday explained some things but sometimes
> even with your verbal presentation, I had trouble figuring out what your
> proposal(s) consist(s) in exactly.

My proposal is to not add terms as types to the core of RDF, and to
focus on either reification, which I don't think can cover all
relevant use cases, or to improve named graphs with some
(backwards-compatible) semantics and syntactic sugar for making small
graphs (mainly but not exclusively singleton sets) simple and
efficient to use for annotation purposes. And that the latter option
can be made compatible with the first through entailment. Most of this
can be drawn from the mentioned Named Graphs paper, but adjusted to
the realities of RDF 1.1 as deployed.

That appears more productive than adding an entirely new, complex term
to the core of RDF. Especially since it comes with its own issues.

The aim of the presentation (which I threw together in haste) was to
motivate that proposal, arriving at slide 32 with the work to be done.
Especially:

    We need a way to say that a named graph (occurrence) is from or of
a graph occurrence (or the default graph occurrence). An appendix of
the graph. That’s the missing piece.

I've worked some more on details, outlined here:

(Use of `rdfx:` and `rdfsx:` prefixes below are for proposed additions
to `rdf:` and `rdfs:` respectively.)

1. For named graphs, instead of just the <name,graph> pair, let that
pairing imply a relationship (a new `rdfx:graph` property) between any
kind of resource (which the name denotes), and an RDF graph. I failed
to arrive at that in slide 30, which got stuck on the token. This
relationship is abstract, TriG and SPARQL already have syntax and
keywords for implying it.

2. As already shown [3], the annotation short forms, or a variant
thereof, can be represented as named graphs. So no adjustments to the
abstract syntax at all. I've been convinced by Andy that relying on
blank graph names for anything won't work.

3. Upon this, management of named graphs in datasets can be more
formalized, by more explicit rules for *accepting* them in datasets
(see that notion in the Named Graphs paper), particularly meaning that
the common practise of making all named graphs visible in a union
default graph [4] can be conditioned, controlled.

4. Define that condition on a notion of *graph name ownership*, where
if a named graph (again, any kind of resource paired with an RDF graph
in a dataset) relates to another with `rdfx:namedBy` (also a new
property), then the dataset doesn't own the name, and the owned named
graph isn't accepted -- it's "neutral", queryable if named but not in
the union default graph. (This replaces, or augments, my previously
suggested "blank graphs as appendices"; again no longer relying on
syntax, only a relationship. For which syntax can be sugar;
specifically if named graphs contain annotations. If a default graph
contains them, reading that into a graph store can associate
"appendices" with the name it is read "into" (meaning that RDF sources
containing datasets can be read as being a graph and its appendices;
of course only if so desired by the dataset manager).)

5. (We're really done here for RDF 1.2, but this is for compatibility
with reification and as a recipe for legacy systems without "union
graph conditional acceptance".) Define an entailment like the one in
the Blank Graphs paper, so that named graphs, either just singleton
sets, or all of them, entail them as reified statements. This can be
done by defining an `rdfsx:implies` relation from the named graph
resource to an informal set of reified statements. So the nature of
the resources "naming" the graph is open as before. Except only for
those "naming" singleton sets, which would "identify with its set", I
guess, practically meaning that their `rdfsx:implies` would be
self-referential (reflexive), and they would be of type
`rdf:Statement` with `rdf:subject`, `rdf:predicate` and `rdf:object`
arcs to the corresponding parts of the singleton set it is an
`rdfx:graph` of.

If this is too tall an order, we *could* leave that work to the next
working group (which I'd gladly join), and *just* define a form of
annotation syntax to Turtle-star as sugar for
assertion-plus-reification (equivalent to @rdf:ID on arcs in RDF/XML).
That would be a rather small first step though; the real power would
be bridging this and named graphs. But if step 5 is feasible, it's not
a step in another direction.

Now to my controversial proposed *removal* of triple terms (motivated
above, and before).

Either A) skip the quoted triple syntax altogether, and add the IMO
more convenient quotation dash form (which "comments out" objects,
making the arc unasserted). Thas gives full syntactic sugar for all
forms of old-school reification, all based on the annotation form
(which needs some revision to support repetition, but that's easy if
we can decide on syntax (which may be less easy)). That'd also leave
that syntax for specialized applications to continue to use (for
triple term types), instead of being co-opted for reification.

Or B), remake it as sugar for reification. I don't think it's worth
it, as it isn't lightyears ahead of explicit reification (e.g. using
some prefix tricks), and I see more unity in the quotation dash (as
use cases call mainly for asserted+annotated, and the dash turns off
the assertion for the odd ones out). Also, even if adding the syntax
is considered more important than its semantics (e.g. because some
implementations have already deployed it, *despite* it not being
standard), one annoyance would be that *if* graph terms would be added
in a future RDF, these two forms might be confusing together. And in
terms of readability, the *form* of `<< s p o >>` *looks* kind of like
an IRI, which is a type term, and not like repeated `[]`, which
represent unique tokens per occurrence. That could be even more
confusing; but possibly an inconsistency that just takes getting used
to.

Of course, I think the numbered suggestions I made above are important
to consider before going the reification only route.

(As asked for by Olaf in last week's telecon, I'm working on a full
proposal. That outline is just a quick summary.)

Best regards,
Niklas

[1]: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3199260>
[2]: <https://gist.github.com/rat10/eaa109ab56b4d77d29e3a826291f8e72#graph-literals>
[3]: <https://gist.github.com/niklasl/c22994e664663b6730613ecc1321c418>
[4]: <https://www.w3.org/TR/sparql11-service-description/#sd-uniondefaultgraph>

>
> --AZ
>
> Le 26/10/2023 à 20:37, Niklas Lindström a écrit :
> > Dear all,
> >
> > Here are the slides I presented at today's teleconference.
> >
> > Best regards,
> > Niklas
> >
> > (PS. Escher's Dragon is pixelated to avoid copyright issues.)
>
> --
> Antoine Zimmermann
> École des Mines de Saint-Étienne
> 158 cours Fauriel
> CS 62362
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 49 97 02
> http://www.emse.fr/~zimmermann/
Received on Wednesday, 1 November 2023 00:25:15 UTC