Re: My issues with graph-based RDF Star from Thomas Lörtsch on 2023-12-14 (public-rdf-star-wg@w3.org from December 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Thu, 14 Dec 2023 17:39:54 +0100
To: Adrian Gschwend <adrian.gschwend@zazuko.com>
Cc: "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>
Message-Id: <B22C81F0-6146-4773-BA1D-EEA0061E01D1@rat.io>
Hi Adrian,

thank you for the detailed description!

> On 8. Dec 2023, at 11:57, Adrian Gschwend <adrian.gschwend@zazuko.com> wrote:
> 
> Hi group,
> 
> Thomas asked me to clarify how I use graphs. I employ them in various ways, but let me illustrate with two examples:
> 
> ## Open Data Endpoint with Multiple Stakeholders & Named Graphs
> 
> For the Swiss government, we operate a large Stardog instance. Multiple ETL pipelines from diverse stakeholders update graphs in this single endpoint, running at different frequencies. Access is managed via ACLs on the named graphs [1], utilizing Stardog's security layer based on Apache Shiro and role-based access control [2]. Pipelines have the flexibility to choose their methods of writing into the triplestore, but most utilize the SPARQL Graph Store Protocol, typically writing to specific graphs through PUT operations, as the primary data source is often external.
> 
> Each stakeholder is allocated one or more named graphs with write permissions. While most graphs are public and accessible via a generic read-user, either as named graphs or through the union default graph, some are restricted for staging purposes and are not visible in the public default graph.
> 
> Graph names follow a defined scheme, and user, role, and permission management is automated.

Without further detail I would assume that the graph naming scheme provides the hook to bind nested graphs to the named graphs they are nested in.

> I don't see how I could use a graph based RDF-Star model here. We never write quads, 

I never write quads either, I write triples and group them in graphs, and then make assertions about those graphs via further triples.

> if I would want to do an RDF Star statement, I would expect it to be part of a triple-representation. If this would not be the case, it would IMO break the design of this use-case.

It’s hard to distinguish fundamental problems from implementation detail here. On a very abstract level triple/graph terms are just terms, so integrate well with RDF’s triple model. But as soon as you look a little bit closer it becomes evident that those terms are stored as regular triples. They have to be, otherwise the implementation would not be performant enough. And in a quad store they will most probably be stored in quads, with the forth element being a reference from the term to its (unasserted, but indexed) triple representation. So there again is the problem how to connect those system-internal term graphs to the named graphs in which those terms are used, without breaking the ACL and production pipelines. I’m not saying that it is easy, I’m just saying that I don’t see how RDF-star triple terms don’t have that problem under the hood. And syntactically the nested graph serialization IMHO is at least on par with <<…>> triple terms.

> In that regard, would a quad based model not by definition break SPARQL Graph Store protocol?

I was told that the SPARQL Graph Store protocol wouldn’t break, but that’s all I can say about that.

> 
> ## Named Graphs as "Documents"
> 
> Another scenario, not directly mine but observed in two companies we collaborated with, involves treating RDF data as "documents". These companies do not use SPARQL directly. Instead, they load data into an Elastic/Opensearch index for efficient reads, as the data is relatively static. Occasional writes are handled by middleware that updates the triplestore.
> 
> In this model, the triplestore essentially functions as an RDF document store, with each document represented as a graph. These graphs group "key-values," which are then indexed as documents in Elastic, transforming each graph into an elastic document.
> 
> In both cases, I was restricted from creating additional graphs, as each graph was treated as a separate document.
> 
> One could argue that using RDF Star in such a scenario might not make sense in the first place. But the same challenges as mentioned above would still apply IMO.

Why not, but in such a document focused scenario I see no problem for nested graphs, and neither for triple terms. They just have to use the right serialization.

We provide a mapping from nested graphs to a strictly named graph based approach in which each nesting is expressed in triples, and the nested graph is encoded as a string with the new datatype rdf:ttl. That is perfectly triple based. We could even replace the datatyped literal with an RDF-star-ish graph term, if we absolutely insisted on introducing a new term type in RDF. And we’d be done. 

Again, under the hood it’s all named graphs (in our implementation, and I assume also in any halfway performant RDF-star implementation), so there will be questions how to implement this. As I said, in your first example the graph names might be the key, but Stardog may provide other mechanisms as well.


Best,
Thomas


> 
> [1]: https://docs.stardog.com/operating-stardog/security/named-graph-security
> [2]: https://docs.stardog.com/operating-stardog/security/security-model
> 
> regards
> 
> Adrian
> 
> -- 
> Adrian Gschwend
> CEO Zazuko GmbH, Biel, Switzerland
> 
> Phone +41 32 510 60 31
> Email adrian.gschwend@zazuko.com
>
Received on Thursday, 14 December 2023 16:40:08 UTC