- From: Niklas Lindström <lindstream@gmail.com>
- Date: Fri, 8 Dec 2023 15:47:29 +0100
- To: Adrian Gschwend <adrian.gschwend@zazuko.com>
- Cc: "public-rdf-star-wg@w3.org" <public-rdf-star-wg@w3.org>, Thomas Lörtsch <tl@rat.io>
Hi Adrian, These are two great cases, and very important to keep just as viable and effective going forward. Thomas Tanon also mentioned RDF Sources in LDP [1] as another crucially important example of graph management in practice. And describing detailed triple provenance or background qualification for those in such scenarios should certainly be doable. The latter case, using Named Graphs as Documents, is basically also how our national library system [2] is implemented. It "cheats" in that regard, as it is not a quad store, but a document store. We just store JSON-LD documents as is (in Postgres, but that is under-the-hood implementation detail), normalized in such a way that each document describes two IRI-named things: itself, as a library Record, and its main entity, which it describes; along with sometimes plenty of bnodes which are either "structured values" or qualified relationships (such as contributions), or simply not-yet-disambiguated -- i.e. not yet linked -- related entities (agents, concepts, unknown works, etc.). One might suspect that my perspective is skewed because of this, since I don't "care that much" about these as pure RDF triples, and work too much in terms of JSON-LD documents. Because as just documents, as Andy has pointed out, having small, "nested" named graphs (often but not always "blankly named") is easy; but is not the same as storing triples under names in quads. However, we certainly rely on this as proper RDF data in many ways. One is the linked data aspect; which we use to create an "embellished" view, following outgoing and some incoming links to generate a "not so concise and not so bounded" description ("providing useful data"). These denormalized views are JSON-LD-framed and indexed in ElasticSearch, and can also be accessed as TriG (as the default graph and "snippets" of relevant embellished details; e.g. in [3], visualized [4]). Another is for indexing each record as a named graph in a graph store (we use Virtuoso for that, but that *should* only be implementation detail) and exposing a SPARQL endpoint [5]. And because of the undefined relation between an RDF Source containing multiple graphs and formal means for managing graphs in graph stores, I have steered clear of utilizing that other than in materialized views. The concept and method I've been trying to convey for these things to work is to utilize the fact that an RDF Source can be one or more graphs, i.e. it can always be a dataset. And if we have means for not just reading the default graph from this source under this-or-that graph name, as part of the union default graph, but also say "and place all other named graphs described in this source into 'appendices', owned by this named graph and not part of the asserted union graph", we can leverage quads for that under the hood. Thus I think not all quads need to represent the same kind of resource (documents, or records). Some can be unasserted "citation graphs", "appendices" or any other of the unconstrained kinds of resources that the "graph name" can denote (including singleton sets, but also e.g. old versions, commit deltas, opaque quotes isolated from entailment processing). And such resources can be linked to the "records" naming the graphs, as in "bound" by them, so that they undergo the same ACL rules, for instance. We "just" don't have any formal means of declaring that, yet. With formal means for those practices, we can describe these other kinds of "appendix graphs" in their binding graph, for detailed provenance and qualification. (And while I claim that this is not *adding* complexity -- since these practices, and others, exist in the wild, due to the unconstrained ways we can use named graphs for -- I readily admit that some of these practices are advanced. Just as detailed provenance and triple qualification is fairly advanced. (Especially since qualification should only be done if you've exhausted the option of using a more granular model, and e.g. deriving simple edges using owl:propertyChainAxiom entailments.) And I think that we may be able to formalize some of it, and leverage that for the RDF-star use cases. And in the process, ideally, paving the path for more ways, such as annotating multiple triples from various contexts.) But as I've said (e.g. in [6]), I'm not excluding other means, even reification, if that "order is too tall". I do think something quad based can be made less obtrusive than adding to the core of RDF (the triples themselves) though, and a major part of why I attempt that is because triples, and Turtle, is *simple*. I don't think multi-edges belong in the core of RDF, at the simple triple level (that would be an even more radical change to the fundament). Nor meta-provenance for that matter; since I think that is more on the level of how to "think in named graphs". However, I think that multiple *contexts* related to simple, asserted graphs (as partial "overlays" if you will, or "circles with post-it:s on records", as I think of them in the library context), is a workable, intuitive model of detailed, fact provenance and ad-hoc/background qualification. (I've tinkered some with illustrating RDF-star data like that [7], [8].) And multi-edges then "emerge" from these added, small (often singleton set) graphs derived from other contexts (sources, observations, underlying complex states of affairs). There is only one simple triple asserted; but these added, described, isolated "external" assertions ("citations" or "quotations") have the same effect, as they identify the same triple (its "type"). Without touching the simple RDF triple fundament, All the best, Niklas [1]: <https://www.w3.org/TR/ldp/#dfn-linked-data-platform-rdf-source> [2]: <https://github.com/libris/librisxl/> [3]: <https://libris.kb.se/fxqnzf2r3t063cf/data.trig> [4]: <https://niklasl.github.io/ldtr/demo/?url=https%3A//libris.kb.se/fxqnzf2r3t063cf/data.trig> [5]: <https://libris.kb.se/sparql> [6]: <https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0061.html> [7]: <https://niklasl.github.io/ldtr/demo/?url=../test/data/prov-qualification-annotation.trig&edit=true> [8]: <https://niklasl.github.io/ldtr/demo/?url=../test/data/lotr-annotated.trig&edit=true> On Fri, Dec 8, 2023 at 11:57 AM Adrian Gschwend <adrian.gschwend@zazuko.com> wrote: > > Hi group, > > Thomas asked me to clarify how I use graphs. I employ them in various > ways, but let me illustrate with two examples: > > ## Open Data Endpoint with Multiple Stakeholders & Named Graphs > > For the Swiss government, we operate a large Stardog instance. Multiple > ETL pipelines from diverse stakeholders update graphs in this single > endpoint, running at different frequencies. Access is managed via ACLs > on the named graphs [1], utilizing Stardog's security layer based on > Apache Shiro and role-based access control [2]. Pipelines have the > flexibility to choose their methods of writing into the triplestore, but > most utilize the SPARQL Graph Store Protocol, typically writing to > specific graphs through PUT operations, as the primary data source is > often external. > > Each stakeholder is allocated one or more named graphs with write > permissions. While most graphs are public and accessible via a generic > read-user, either as named graphs or through the union default graph, > some are restricted for staging purposes and are not visible in the > public default graph. > > Graph names follow a defined scheme, and user, role, and permission > management is automated. > > I don't see how I could use a graph based RDF-Star model here. We never > write quads, if I would want to do an RDF Star statement, I would expect > it to be part of a triple-representation. If this would not be the case, > it would IMO break the design of this use-case. > > In that regard, would a quad based model not by definition break SPARQL > Graph Store protocol? > > ## Named Graphs as "Documents" > > Another scenario, not directly mine but observed in two companies we > collaborated with, involves treating RDF data as "documents". These > companies do not use SPARQL directly. Instead, they load data into an > Elastic/Opensearch index for efficient reads, as the data is relatively > static. Occasional writes are handled by middleware that updates the > triplestore. > > In this model, the triplestore essentially functions as an RDF document > store, with each document represented as a graph. These graphs group > "key-values," which are then indexed as documents in Elastic, > transforming each graph into an elastic document. > > In both cases, I was restricted from creating additional graphs, as each > graph was treated as a separate document. > > One could argue that using RDF Star in such a scenario might not make > sense in the first place. But the same challenges as mentioned above > would still apply IMO. > > > [1]: > https://docs.stardog.com/operating-stardog/security/named-graph-security > [2]: https://docs.stardog.com/operating-stardog/security/security-model > > regards > > Adrian > > -- > Adrian Gschwend > CEO Zazuko GmbH, Biel, Switzerland > > Phone +41 32 510 60 31 > Email adrian.gschwend@zazuko.com >
Received on Friday, 8 December 2023 14:48:02 UTC