I would like to use RDF to describe graphs in a dataset, e.g. to say who was the author of a particular graph.
This formal objection is made in a personal capacity, although I am addressing one of the presenting use cases as part of my job responsibilities at Syapse, Inc., which is not (currently) a W3C member.
The RDF WG has not met one of its charter objectives: to provide a semantics for named graphs: while this does not detract from the value of the work they have achieved, it is disappointing. I request that either they are asked, in a different way, to have another go; or that in the next round of RDF standardization the same objective should be given, but in a structure more likely to result in success than a repeat of the current lack of consensus.
As per W3C process, a formal objection is a
request that the Director consider[s this] as part of evaluating the [...] request to advance a technical report
In asking for this consideration, I am hoping that such an evaluation may result in:
I wish to explicitly call out, that I am rejecting the advice of the team contact, who warns:
Hopefully at this point the commenter realizes they're in a Solomon and the Baby situation and will only formally object if they think the world would be better if the spec died.
I believe the current specification is good and can go forward unchanged, but I am disappointed that it is not better, and in particular that it fails to address a basic and important use case: of using RDF to provide metadata for graphs in a dataset. If it is not possible for the consortium to find the resources to address this use case in this round of the standardization process, then that is a decision for the consortium's management, and I can live with the result: an RDF 1.1 without this feature is better than an RDF 1.0 without any of the additional features in RDF 1.1.
A charter requirement is to Standardize a model and semantics for multiple graphs and graphs stores
.
(My emphasis). Given that even very basic use cases to talk about graphs within collections of multiple graphs are not addressed,
it is hard to agree that this requirement has been met.
I described my actual presenting business use case in my original comment towards the end of the message.
To recap, avoiding the RDF details, we have an N-tier architecture in the cloud,
with the bottom layer exposing a SPARQL endpoint.
Different end customers provide data to our application, and this data is
stored in the RDF data store.
A key requirement is to keep data from one customer separate from data of another customer.
For our higher end customers we achieve this by using distinct physical boxes.
For other customers we used named graphs: we put data from a particular customer
in two or three named graphs dedicated to that customer. All queries then
use the FROM
and FROM NAMED
constructions in SPARQL
to restrict a customer to reading their own data, and not other customers' data.
Obviously, to make this work, we need to remember the mapping between named graphs in the RDF data store and the customers. This mapping is metadata about the named graphs, and since RDF is good at metadata, storing that metadata within the RDF data store as RDF seems, on the surface, a good way to go: and had the RDF Working Group addressed the charter objectives to provide a semantics for named graphs, then this simple case would have been a good verification that their solution worked, prior to advancing to Recommendation.
Frankly, the business impact of the lack of standardization in this area is not high. In practice, we use the named graph semantics from my 2004 paper (cited below), and life goes on: each customer gets to see their data and not other customer's data.
However: should our customers, or their customers, ask to audit our approach to preserving their data privacy, rather than being able to point to a W3C Recommendation which we are following for interpreting the metadata on which our approach hinges we have to explain a proprietary approach.
Further, we have made the business decision to migrate away from a proprietary knowledge store solution to one based on W3C Semantic Web technologies in order to reduce costs. An example cost is that of on-boarding new engineers into our back-end team. The lack of standardization in this area means that: we will need to explain aspects of our approach to new engineers who otherwise could simply refer to standards, and we are unlikely to find engineers already versed in our approach. In addition, every missing standard feature that is required for serious semantic web deployment, means that we have to add in a proprietary dependency, which increases the extent of our lock-in with any particular supplier (including our own team).
In summary, we envisage a small impact on our future costs arising from the lack of standardization of this feature.
This influential paper by Carroll, Bizer, Hayes and Stickler, lists many use cases including:
The minutes cite "numerous use cases"; without making any explicit but does give a sizable list of references.
applications dealing with nanopublications would simply need to be written with the understanding that graph names (IRIs, BNodes) used in graph in statements should refer to that graph in the datasetseems to beg the question as to why this is not recommended by the WG, and is instead suggested as an out-of-band application specific understanding.
As dissatisfied commentators go, one complaint I definitely cannot make is that the working group has not taken my issue sufficiently seriously. They have tried hard to address this issue over the lifetime of the group, for instance, datasets (and to some extent their semantics) have been considered at:
I have indicated various proposed solutions.
My preferred solution was one I developed with other members of the Semantic Web Interest Group in 2004, and was submitted
by reference to the workshop that initiated this working group,
and is summarized in this editor's draft as:
The notion of RDF interpretation is extended to named graphs by saying that the graph IRI in the pair must denote the pair itself.
I have reiterated this proposal recently on the comments list.
I have also expressed myself as open to informal text in RDF Concepts that, in practice, would amount to the same thing.
I have also proposed adding at least one new term to the RDF Vocabilary (rdfs:Graph
or rdf:Graph
)
as an alternative approach to normatively defining such a semantics in a way that does not impose it on everyone (although personally
I think it is a better design to make such an imposition)
An aspect of my own failure is that I have never fully understood why there is any opposition what-so-ever to what I, in my blinkered way, see as the obviously correct semantics! (As suggested above, and repeatedly in the past: starting in 2004)
One possible explanation is as follows:
Therefore SPARQL implementors may have seen it as both in their business interest to oppose any model theoretic semantics which addresses my simple use cases, and as a reasonable expectation that a new W3C recommendation should not invalidate an older one.
If this is one of the causes, it never became explicit, which prevented the concern being addressed.
Relatively briefly: operational semantics and model theoretic semantics are different beasts. One may critique the other, but an out-an-out conflict is generally avoidable. More pertinently the semantic web succeeds in as much as we manage to have semantic interoperability, not simply a shared operational understanding (which many alternative and better established technologies also have). An appropriate level of shared model theoretic semantics across the web is hence crucial to our shared success. At this level everyone's interests align.
That the actual business impact, in a specific use case, is not high, raises the question as to why is this an important issue.
A key value of the semantic web, is not on having standardized on syntactic structures: in general XML is easier to deal with
than RDF graphs and datasets; but on having standardized enough of a minimal semantics and enough vocabulary to be able
to have interoperability at more than a syntactic level.
To the extent that the WG proposes private channels for applications to share their understanding [... of] graph names
this undermines the core mission of the Semantic Web.
As with many Working Groups, the RDF WG has met charter objectives that
were about evolving existing pieces of work at various stages of maturity.
The charter did not take such an approach with the semantics
of named graphs, but used a neutral, i.e. new, term to articulate the work item
semantics for multiple graphs and graph stores
. This, in effect, directed the working group
to start from a fairly blank piece of paper: and they have essentially failed to get consensus
on even beginning to fill it in.
A more effective way for the working group to achieve this goal is a fairly well-trodden path: start from a respected piece of work that has already been developed to a lower level of maturity, and use the method of writing a use cases and requirements document as a way of driving critique and review of some pre-existing technology that is being refined by the W3C process.
The named graphs semantics developed by myself, Pat Hayes, Chris Bizer and Patrick Stickler in 2004 as part of our named graphs work, still remains an obvious starting point. This work was done at W3C, as an informal task force within the Semantic Web Interest Group, and has been widely published and cited.
Specific things that went wrong with the RDF Working Group's work in this area are:
the graph name does not formally denote the graph.
I feel that the heart of my formal objection is not technical but to do with direction. The charter inadvertently set the WG up for a failure in this area, and repeated attempts, within the current framework to address even the simplest use cases for the semantics of graph naming have failed; including use cases that have been solved for almost a decade. The underlying cause of that failure may have been leaving the WG too much room to maneuver and not giving a clear enough direction. While my comment could be met fairly straightforwardly by adopting technical proposals above, achieving consensus is harder.
It is unlikely that simply returning the documents to the WG for further work on this matter will be effective.
My suggestion is:
It would be very understandable if there was no desire to extend the current WG's charter: whether through having exhausted the commitment of either the consortium or the participants.
In such a case, then this does leave unfinished work, that should be picked up again, by the next RDF WG, at some point in the future.
When writing the next charter, I hope that the advice admittedly above will be heeded, and in addition the charter should include a requirement to extend the RDF Vocabulary to include concepts from RDF datasets and named graphs.