Re: AW: Next weeks discussions and decision-making for RDF Star WG from Niklas Lindström on 2023-11-12 (public-rdf-star-wg@w3.org from November 2023)

From: Niklas Lindström <lindstream@gmail.com>
Date: Sun, 12 Nov 2023 16:00:57 +0100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: public-rdf-star-wg@w3.org
Message-ID: <CADjV5jeLtj0WoWmxozD2GjmHq8ePw=os3bq6hZOBs=thZkkZhA@mail.gmail.com>
On Sun, Nov 12, 2023 at 1:27 PM Peter F. Patel-Schneider
<pfpschneider@gmail.com> wrote:
>
> As far as I can tell, although it is well disguised, option 2 is the way the
> working group was progressing.  This option would probably be very close or
> identical to existing RDF-star implementations.
>
> In my opinion for the working group to take up any other option requires
> either finishing option 2 or making a determination that quoted triples as in
> https://www.w3.org/TR/2023/WD-rdf12-concepts-20231013/#section-triples are
> fundamentally flawed.

It is certainly a fundamental change to the 24 year old substrate and
abstract syntax of RDF.

It has been presented as "better reification", while at the same time,
by design, breaking from what reification is explicitly defined to
cater for [1]: The subject of a reification is intended to refer to a
concrete realization of an RDF triple, such as a document in a surface
syntax, rather than a triple considered as an abstract object. This
supports use cases where properties such as dates of composition or
provenance information are applied to the reified triple, which are
meaningful only when thought of as referring to a particular instance
or token of a triple.

Significantly, this break from that design has also made examples and
use cases suffer ("the seminal example" problem [2]), in that simple
usage works as is, but has to introduce some kind of relation to a
token occurrence (commonly using a custom property and a blank node).
Yet there is no explanation on why reification is a worse design, nor
how the two are supposed to be used in conjunction.

The proposed design also uses opacity by default, but does not in any
way relate that to the various uses of named graphs, which can --
albeit not normatively so (since they have no defined semantics) -- be
used for this same purpose of opaque quotation. There are many forms
of quotation [3], which in being a paradigmatic opaque context is
reasonably more related to sets of triples, i.e. graphs, and combined
usage thereof. Unless graphs are now supposed to be explicitly
declared sets of triple terms? That is certainly not part of any
proposal (probably for a good reason: we already have graphs).

I would like a clarification of what "victory" means here. Victory for
the CG report? For the implementers of that? For RDF library and tool
maintainers in general? For the users of RDF past, present and future?

We have these two existing options, reification and named graphs,
which all use cases seem to be able to utilize. But who could do with
better syntax, and more clarification in the specs. We have an
obligation (at least to the wider RDF community) to see if this is a
more reasonable path than adding something complex to the core of RDF.
And if it is not, we must clarify *why* not.

I've spoken to several people, both new and seasoned users of RDF, in
academia and in various production environments (many who also work
with training people in using RDF). I have only heard concerns that
adding a new triple term, being a recursively defined structure, makes
RDF more complex. Harder to understand, harder to develop best
practices for. Conversely and, crucially, I haven't heard anyone
saying that they make their use cases simpler. In certain cases (e.g.
those only supporting the "PG form") the *syntax* of RDF-star
annotations (as asserted plus quoted or reified) has looked like a
promising way to do granular provenance, striking a balance between
cumbersome reification (unless in the otherwise cumbersome RDF/XML
form) and coarse-grained named graphs. We already know that this
syntax too can easily be mapped to any (or both) of those options.

That something has been implemented is not a strong argument (and if
it was hard to implement, that is a potential case against it). Of the
implementations of RDF-star some appear partial, e.g. only for the PG
form (which at least in the AllegroGraph case also implied using quad
multisets [4]). And there are many more cases where it has not been
implemented at all (the complete set of libraries, tools and
installations of the past 20+ years). It is a lot easier to add new
things than to build and improve upon what is there. It is an entirely
other thing to work with that for decades.

Of course I am aware that my own ingrained habits, expectations and
assumptions fundamentally affect my beliefs, and thus my
comprehension. I am open to the epiphany that "triples all the way
down" is the mathematically most pure, simple and effective design for
all known use cases, and that it somehow actually makes it simpler to
understand RDF than if we e.g. keep clinging to named graphs, or draw
circles around "the triple itself" with reification, if it is the
triple itself that is needed (and not, I must stress, the triple in
*some named graph*, for that quality it does not have without dragging
it down into token space). I have not yet had that epiphany, and I'm
still looking for guidance towards that. For instance, are there any
use cases that triple terms enable that are entirely novel? Please
clarify which ones, and submit them to the collected use cases of the
working group [5]. It is the collected cases that we must measure
against. (I haven't seen any belief system cases such as "<Mary>
:believes << <Jane> :said << <Bob> :knows <Jane> >> >>" there, so if
anyone is doing substantive work with such data in RDF, please add
that there!)

Regards,
Niklas

[1]: https://www.w3.org/TR/rdf11-mt/#reification
[2]: https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html#the-seminal-example
[3]: https://plato.stanford.edu/entries/quotation/
[4]: https://lists.w3.org/Archives/Public/public-rdf-star/2020Aug/0021.html
[5]: https://github.com/w3c/rdf-ucr/

>
> peter
>
>
> On 11/12/23 03:41, Sasaki, Felix wrote:
> > Hi Adrian, Gregg and all,
> >
> > Adrian, thanks a lot for the summary. As somebody relatively new to the
> > working group and not attending the last meeting, I am struggling to
> > understand the impact of the options.
> >
> > How is this topic related to RDF star? How would it influence the role of
> > existing RDF star implementations
> >
> > https://w3c.github.io/rdf-star/implementations.html
> > <https://w3c.github.io/rdf-star/implementations.html>
> >
> > Best,
> >
> >
> > Felix
> >
> > *Von: *Gregg Kellogg <gregg@greggkellogg.net>
> > *Datum: *Samstag, 11. November 2023 um 21:23
> > *An: *Adrian Gschwend <adrian.gschwend@zazuko.com>
> > *Cc: *public-rdf-star-wg@w3.org <public-rdf-star-wg@w3.org>
> > *Betreff: *Re: Next weeks discussions and decision-making for RDF Star WG
> >
> >
> >
> > Sie erhalten nicht oft eine E-Mail von gregg@greggkellogg.net. Erfahren Sie,
> > warum dies wichtig ist <https://aka.ms/LearnAboutSenderIdentification>
> >
> >
> >
> > I think it’s great to focus on resolving this fundamental issue. Below are the
> > outlined suggestions with some additional thoughts:
> >
> > 1) Do nothing beyond RDF 1.1, there’s already a reification vocabulary with
> > native support in RDF/XML
> >
> > 1.1) Same as above, but add syntactic sugar to Turtle/TriG/SPARQL for
> > expressing reified statements. This would most naturally involve using a blank
> > node subject, rather than a fragment identifier. Something based on the
> > current quotedTriple syntax ‘<<‘ qtSubject predicate qtObject ‘>>’ could be
> > syntactic sugar for [ a rdf:Statement; rdf:subject qtSubject; rdf:predicate
> > predicate; rdf:object qtObject ]. (Stable identification can be addressed via
> > indirection such as <#frag> rdfx:instanceOf <:a :b :c >).
> >
> > 2) Declare victory using the current tripleTerm resource. Triples are types,
> > and something like rdfx:instanceOf can be used to derive tokens.
> >
> > 3) Leverage RDF 1.1 named graphs with the provision that a blank node graph
> > name used elsewhere as a subject or object “identifies" that graph so named
> > (with some work on what “identifies” means in this context). This is
> > effectively how JSON-LD is used in Verifiable Credentials and elsewhere. A
> > graph inclusion hierarchy, if required, can be derived by following the path
> > from subject/object to graph name/graph.
> >
> > 4) Create a graphTerm resource where graphTerms are first-class terms and are
> > distinct from named graphs. This is probably closest to how Notation3 uses
> > graphs. This is arguably the purist from a logic point of view, but may be
> > more difficult to express in abstract and concrete syntaxes.
> >
> > Frankly, I think any of the single triple use cases can be expressed using
> > either of these paradigms; collections of triples require one of the
> > graph-based solutions. The main issue that gets in the way of settling on this
> > is the Type/Token debate. I think this can be resolved in other ways. This
> > doesn’t attempt to consider transparency/opacity; for blank nodes, I think
> > opacity can be solved at the syntax level, by using identifiers that don’t
> > overlap.
> >
> > Consider the URL http://xmlns.com/foaf/0.1/Person
> > <http://xmlns.com/foaf/0.1/Person>. It could be considered to denote an RDF
> > document containing a vocabulary definition for foaf:Person. It can be
> > considered to be both a type and a token, depending on how it is used. In the
> > context of the vocabulary definition, it is a token against which other
> > properties can be defined:
> >
> >     foaf:Person a rdfs:Class; rdfs:label “Person”; …
> >
> > In another context, it is a type:
> >
> >     <http://rdfweb.org/people/danbri <http://rdfweb.org/people/danbri>> a
> >     foaf:Person; foaf:name "Dan Brickley” ...
> >
> > Similarly, a graphTerm could be considered to be a type or a token, depending
> > on the context in which it is used. Something like {:a :b :c} a rdf:Graph has
> > the characteristic of a type, while {:a :b :c} ex:containedIn
> > <http://example.com/foo <http://example.com/foo>> has the characteristic of a
> > token. We can leverage rdfs:range/domain and explicit type declarations to
> > clarify the intended meaning.
> >
> > Perhaps we can use other explicit or implicit typing to clarify the use cases
> > about when we are identifying a specific statement within a graphTerm or the
> > graph itself as a collection of statements.
> >
> > Regarding the different possibilities outlined above: RDF is a system for
> > describing graphs/datasets composed of triples/statements. IMHO, the
> > fundamental building block should be a graph, so I favor either leveraging
> > named graphs or adding a top-level graphTerm (options 3 and 4 above). I think
> > the impact on implementations, such as quad stores, favors reusing and
> > refining the RDF 1.1 concept of named graphs, but with nuance given to graphs
> > named by blank nodes. This also works as is with N-Quads. Representing
> > graphTerms natively requires some form of syntactic extension (either embedded
> > graphs, or a new space for graph identifiers) as well as defining a graphTerm
> > similar to how we’ve already defined a tripleTerm in the abstract algebra.
> >
> > If the WG is not able to take on the work for describing such use of named
> > graphs, then I would favor doing something more like 1.1: reuse the existing
> > reification vocabulary with syntactic sugar from the quotedTriple production
> > of Turtle/TriG and SPARQL rather than adding a new tripleTerm which could
> > interfere with future groups to take on the work of better describing the use
> > of graphs as resources. But, I’m happy to go along with the consensus of the
> > group whatever we decide.
> >
> > Gregg Kellogg
> > gregg@greggkellogg.net
> >
> >
> >
> >     On Nov 10, 2023, at 8:42 PM, Adrian Gschwend <adrian.gschwend@zazuko.com>
> >     wrote:
> >
> >     Dear all,
> >
> >     Following our last meeting and discussions on the various proposals, it
> >     has become clear that we need to focus our efforts on choosing a specific
> >     direction for our next steps. To facilitate our decision-making process,
> >     we are asking all members to review the proposals in detail and consider
> >     which one they currently favor. This preliminary decision will help to
> >     make the discussion at our next meetings more structured and productive.
> >
> >     The next meeting is on November 16, as discussed for once we start at the
> >     normal time but stay one hour longer. Please see the calendar for details,
> >     the event is updated.
> >
> >     regards
> >
> >     Ora & Adrian
> >     --
> >     Adrian Gschwend
> >     CEO Zazuko GmbH, Biel, Switzerland
> >
> >     Phone +41 32 510 60 31
> >     Email adrian.gschwend@zazuko.com
> >
>
Received on Sunday, 12 November 2023 15:01:58 UTC