Graph Naming Semantics

Graph Naming Semantics — Formal Objection concerning Issue-142

I would like to use RDF to describe graphs in a dataset, e.g. to say who was the author of a particular graph.

This formal objection is made in a personal capacity, although I am addressing one of the presenting use cases as part of my job responsibilities at Syapse, Inc., which is not (currently) a W3C member.

The RDF WG has not met one of its charter objectives: to provide a semantics for named graphs: while this does not detract from the value of the work they have achieved, it is disappointing. I request that either they are asked, in a different way, to have another go; or that in the next round of RDF standardization the same objective should be given, but in a structure more likely to result in success than a repeat of the current lack of consensus.

Goals
Semantics of Named Graphs
Use Cases for Metadata About Graphs
Work done by the working group in this area
Proposed Technical Changes
Possible perceived tension between SPARQL datasets and this proposal
Why is this important?
Analysis as to some of the Causes of this Failure
Ways Forward

Goals

As per W3C process, a formal objection is a

request that the Director consider[s this] as part of evaluating the [...] request to advance a technical report

In asking for this consideration, I am hoping that such an evaluation may result in:

either: returning the work to the working group with adequate time, and direction to address an issue that they have struggled with
or: ensuring that the next RDF WG is chartered to address this issue in a way that will prove more effective than the current WG charter

I wish to explicitly call out, that I am rejecting the advice of the team contact, who warns:

Hopefully at this point the commenter realizes they're in a Solomon and the Baby situation and will only formally object if they think the world would be better if the spec died.

I believe the current specification is good and can go forward unchanged, but I am disappointed that it is not better, and in particular that it fails to address a basic and important use case: of using RDF to provide metadata for graphs in a dataset. If it is not possible for the consortium to find the resources to address this use case in this round of the standardization process, then that is a decision for the consortium's management, and I can live with the result: an RDF 1.1 without this feature is better than an RDF 1.0 without any of the additional features in RDF 1.1.

Semantics of Named Graphs

A charter requirement is to Standardize a model and semantics for multiple graphs and graphs stores. (My emphasis). Given that even very basic use cases to talk about graphs within collections of multiple graphs are not addressed, it is hard to agree that this requirement has been met.

Use Cases for Metadata About Graphs

The Syapse Use Case: Multi-tenant SPARQL Endpoint

I described my actual presenting business use case in my original comment towards the end of the message.

To recap, avoiding the RDF details, we have an N-tier architecture in the cloud, with the bottom layer exposing a SPARQL endpoint. Different end customers provide data to our application, and this data is stored in the RDF data store. A key requirement is to keep data from one customer separate from data of another customer. For our higher end customers we achieve this by using distinct physical boxes. For other customers we used named graphs: we put data from a particular customer in two or three named graphs dedicated to that customer. All queries then use the FROM and FROM NAMED constructions in SPARQL to restrict a customer to reading their own data, and not other customers' data.

Obviously, to make this work, we need to remember the mapping between named graphs in the RDF data store and the customers. This mapping is metadata about the named graphs, and since RDF is good at metadata, storing that metadata within the RDF data store as RDF seems, on the surface, a good way to go: and had the RDF Working Group addressed the charter objectives to provide a semantics for named graphs, then this simple case would have been a good verification that their solution worked, prior to advancing to Recommendation.

Business Impact of RDF 1.1 not Including a Semantics for this Feature

Frankly, the business impact of the lack of standardization in this area is not high. In practice, we use the named graph semantics from my 2004 paper (cited below), and life goes on: each customer gets to see their data and not other customer's data.

However: should our customers, or their customers, ask to audit our approach to preserving their data privacy, rather than being able to point to a W3C Recommendation which we are following for interpreting the metadata on which our approach hinges we have to explain a proprietary approach.

Further, we have made the business decision to migrate away from a proprietary knowledge store solution to one based on W3C Semantic Web technologies in order to reduce costs. An example cost is that of on-boarding new engineers into our back-end team. The lack of standardization in this area means that: we will need to explain aspects of our approach to new engineers who otherwise could simply refer to standards, and we are unlikely to find engineers already versed in our approach. In addition, every missing standard feature that is required for serious semantic web deployment, means that we have to add in a proprietary dependency, which increases the extent of our lock-in with any particular supplier (including our own team).

In summary, we envisage a small impact on our future costs arising from the lack of standardization of this feature.

Use Cases from Named Graphs, Provenance and Trust 2004

This influential paper by Carroll, Bizer, Hayes and Stickler, lists many use cases including:

Data syndication
Restricting information usage
Access control
signing RDF graphs
Semantic Web publishing

Use Cases from the June 2010 RDF/NextStepWorkshop workshop

The minutes cite "numerous use cases"; without making any explicit but does give a sizable list of references.

Use Cases from Other Last Call Comments

owl:imports when used within an RDF dataset refers to graphs by name, potentially within the dataset. The lack of resolution concerning the semantics of graph naming from the RDF WG leaves a significant disconnect here.
nanopublications require some semantics for graph naming: the WG response applications dealing with nanopublications would simply need to be written with the understanding that graph names (IRIs, BNodes) used in graph in statements should refer to that graph in the dataset seems to beg the question as to why this is not recommended by the WG, and is instead suggested as an out-of-band application specific understanding.

Work done by the working group in this area

As dissatisfied commentators go, one complaint I definitely cannot make is that the working group has not taken my issue sufficiently seriously. They have tried hard to address this issue over the lifetime of the group, for instance, datasets (and to some extent their semantics) have been considered at:

F2F1, particularly: this report, and Issue-15
F2F2
F2F3

Proposed Technical Changes

I have indicated various proposed solutions. My preferred solution was one I developed with other members of the Semantic Web Interest Group in 2004, and was submitted by reference to the workshop that initiated this working group, and is summarized in this editor's draft as: The notion of RDF interpretation is extended to named graphs by saying that the graph IRI in the pair must denote the pair itself. I have reiterated this proposal recently on the comments list.

I have also expressed myself as open to informal text in RDF Concepts that, in practice, would amount to the same thing.

I have also proposed adding at least one new term to the RDF Vocabilary (rdfs:Graph or rdf:Graph) as an alternative approach to normatively defining such a semantics in a way that does not impose it on everyone (although personally I think it is a better design to make such an imposition)

Possible perceived tension between SPARQL datasets and this proposal

An aspect of my own failure is that I have never fully understood why there is any opposition what-so-ever to what I, in my blinkered way, see as the obviously correct semantics! (As suggested above, and repeatedly in the past: starting in 2004)

One possible explanation is as follows:

RDF 1.0 and 1.1 Semantics takes a model theoretic view of semantics
SPARQL (1.0 and 1.1) did not take a model theoretic view but provides an operational semantics for query over datasets including named graphs.
SPARQL implementors may have had concern that a new model theoretic semantics for datasets and named graphs may invalidate implementations based on the operational semantics already recommended.

Therefore SPARQL implementors may have seen it as both in their business interest to oppose any model theoretic semantics which addresses my simple use cases, and as a reasonable expectation that a new W3C recommendation should not invalidate an older one.

If this is one of the causes, it never became explicit, which prevented the concern being addressed.

Relatively briefly: operational semantics and model theoretic semantics are different beasts. One may critique the other, but an out-an-out conflict is generally avoidable. More pertinently the semantic web succeeds in as much as we manage to have semantic interoperability, not simply a shared operational understanding (which many alternative and better established technologies also have). An appropriate level of shared model theoretic semantics across the web is hence crucial to our shared success. At this level everyone's interests align.

Why is this important?

That the actual business impact, in a specific use case, is not high, raises the question as to why is this an important issue.

A key value of the semantic web, is not on having standardized on syntactic structures: in general XML is easier to deal with than RDF graphs and datasets; but on having standardized enough of a minimal semantics and enough vocabulary to be able to have interoperability at more than a syntactic level. To the extent that the WG proposes private channels for applications to share their understanding [... of] graph names this undermines the core mission of the Semantic Web.

Analysis as to some of the Causes of this Failure

As with many Working Groups, the RDF WG has met charter objectives that were about evolving existing pieces of work at various stages of maturity. The charter did not take such an approach with the semantics of named graphs, but used a neutral, i.e. new, term to articulate the work item semantics for multiple graphs and graph stores. This, in effect, directed the working group to start from a fairly blank piece of paper: and they have essentially failed to get consensus on even beginning to fill it in.

A more effective way for the working group to achieve this goal is a fairly well-trodden path: start from a respected piece of work that has already been developed to a lower level of maturity, and use the method of writing a use cases and requirements document as a way of driving critique and review of some pre-existing technology that is being refined by the W3C process.

The named graphs semantics developed by myself, Pat Hayes, Chris Bizer and Patrick Stickler in 2004 as part of our named graphs work, still remains an obvious starting point. This work was done at W3C, as an informal task force within the Semantic Web Interest Group, and has been widely published and cited.

Specific things that went wrong with the RDF Working Group's work in this area are:

Research

The working group spent significant effort in developing their own terminology and perspective on named graphs and semantics. See graph terminology: g-boxes and g-snaps etc.

Not excluding known, very difficult problems

In 2002, the RDF Core WG made significant advances in the area of RDF Semantics, despite many similar obstacles (lack of clear starting point; lack of consensus about use cases and requirements; semantics being a somewhat esoteric topic). One way this was achieved was by being very clear that some aspects are just too hard. Temporal semantics and modalities are two key areas. For example, the WG's term g-snap even in its definition makes reference to time.

Not publishing early and often

Text addressing the charter requirement to provide a semantics for named graphs has only been published in working drafts in 2013, and not before. The text itself is very scant, and all about the lack of semantics rather than defining a semantics:

a brief note in RDF Concepts, saying the graph name does not formally denote the graph.
the same note repeated with editorial changes
new text in the LC semantics document that fails to specify a semantics for datasets

Delaying explicitly addressing semantics

There were some relevant early decisions, such as closing Issue-15, these were negative (this semantics is rejected), rather than positive: here is the proposed semantics. Since then, work on the semantics of datasets appears to have floundered. In fact, at no point as the WG arrived at even a provisional semantics for datasets with sufficient consensus to publish, even simply as a request for feedback from the community.

My own failures and inattentiveness

Key poor decisions made by the working group seem to be the resolutions concerning issue 15 and issue 14. Both were made while I was a participant of the WG, even if largely inactive, and unable to attend the key meetings. With hindsight, I should have objected to these decisions at the time.

Ways Forward

I feel that the heart of my formal objection is not technical but to do with direction. The charter inadvertently set the WG up for a failure in this area, and repeated attempts, within the current framework to address even the simplest use cases for the semantics of graph naming have failed; including use cases that have been solved for almost a decade. The underlying cause of that failure may have been leaving the WG too much room to maneuver and not giving a clear enough direction. While my comment could be met fairly straightforwardly by adopting technical proposals above, achieving consensus is harder.

Extending Current WG's Charter

It is unlikely that simply returning the documents to the WG for further work on this matter will be effective.

My suggestion is:

To extend the charter to give the WG more time, e.g. six months
To suggest that the WG publish ASAP:
- a working draft that actual articulates a (non-consensus) semantics for graph names, ideally based on Carroll et al. 2004
- a use cases and requirements for named graphs WD that articulates some of the use cases above
To explicitly rule out temporal and modal considerations from the WG's deliberations on this issue
To encourage the WG to work with community feedback over the next few months to take the work forward

Or A Postponed Issue for Next WG

It would be very understandable if there was no desire to extend the current WG's charter: whether through having exhausted the commitment of either the consortium or the participants.

In such a case, then this does leave unfinished work, that should be picked up again, by the next RDF WG, at some point in the future.

When writing the next charter, I hope that the advice admittedly above will be heeded, and in addition the charter should include a requirement to extend the RDF Vocabulary to include concepts from RDF datasets and named graphs.