Comments on SPARQL 1.1 Uniform HTTP Protocol Working Draft 14 October 2010 from Ian Davis on 2010-12-15 (public-rdf-dawg-comments@w3.org from December 2010)

From: Ian Davis <ian.davis@talis.com>
Date: Wed, 15 Dec 2010 00:16:04 +0000
To: public-rdf-dawg-comments@w3.org
Message-ID: <AANLkTingE9NaTWQviQvtY36iKP0pMZAnYe7seZd4vNP_@mail.gmail.com>

I reviewed the document at
http://www.w3.org/TR/2010/WD-sparql11-http-rdf-update-20101014/ and enclose
my initial comments below. Note that I stopped my review after section 4.2

Note also that in my comments I use the word "represent" and
"representation" only in the sense as defined by rfc2616.

Section 2

Graph Store is defined to be mutable. I don't see why it needs that
requirement. The read only aspects of this document could apply to a
non-mutable Graph Store

Section 4.1

I don't at all understand the need for the distinction in this document
between a graph and RDF knowledge. I find the supplied explanation
particularly confusing:

"we are not directly identifying an RDF graph but rather the RDF knowledge
that is represented by an RDF document, which serializes that graph"

I have seen serialization and representation used interchangeably in many
REST discussions but never seen them used as distinct operations so I don't
know what to make of it really.

If my understanding of the terminology is correct than I think the
relationships are that RDF Knowledge is the result of interpreting an RDF
graph which may be represented by an RDF document. In this case the
identified resource that is emitting representations is the graph itself.
The RDF Knowledge is not explicitly named here, but could be somehow.

The immediately following sentence "Intuitively, the interpetations that
satisfy [RDF-MT] the RDF graph serialized by the RDF document can be thought
of as this RDF knowledge" implies that the Graph IRI identifies multiple
things, i.e. multiple interpretations. It's axiomatic on the web that a URI
(IRI) identifies only one resource so I see this as a conflict.

I assume the introduction of the term "RDF Knowledge" is motivated by an
attempt to unify the concept of distinct document-like resources that you
encounter on the web and an aggregation of the data in those documents as
you might find in a database. I think this document would benefit from the
removal of that term entirely and the addition of a section describing how a
Graph Store might aggregate and interpret the graphs to form one or more
datasets that may be accessed with zero or more SPARQL or other services.
What form of entailment used by the Graph Store is out of scope of the
document, but certainly will affect the behaviour of the SPARQL services it
provides.

Section 4.2

The diagram implies that the encoded URI (e.g.
http://www.example.org/other/graph) and the indirect URI
http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/graphidentify
they same RDF Knowledge. Does this imply this triple:

<
http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/graph
>
<http://www.w3.org/2002/07/owl#sameAs>
<http://www.example.org/other/graph> .

I think the whole notion of indirect identification is problematic. What the
document is saying, in essence, is that if you have the URI of a graph you
need to discover some other URI by an unspecified mechanism with which to
manipulate it. If you discover multiple such URIs would you be justified in
assuming that they all manipulate the same underlying graph?

I am not convinced that it is intuitive that the following identify
different graphs that have the same URI

http://foo.com/graphs?graph=http%3A//www.example.org/other/graph
http://bar.org/rdf-data?graph=http%3A//www.example.org/other/graph

Furthermore, should a conformant server that supports multiple independent
collections of graphs (e.g. Talis Platform) be required to enforce that
graph URIs identify the same knowledge across all the collections? In other
words are the following required to manipulate the same "RDF Knowledge":

http://api.talis.com/dataset1/graphs?
graph=http%3A//www.example.org/other/graph<http://ex1.com/g?graph=http%3A//www.example.org/other/graph>
http://api.talis.com/dataset2/graphs?
graph=http%3A//www.example.org/other/graph<http://ex1.com/g?graph=http%3A//www.example.org/other/graph>

The following sentence implies that this is the case: "Any server that
implements this protocol and receives a request URI in this form SHOULD
invoke the indicated operation on the RDF knowledge identified by the URI
embedded in the query component where the URI is the result of
percent-decoding the value associated with the graph key."

At this point I stopped my review. That the two areas I explored are
complicated excessively by the introduction of the RDF Knowledge concept
into what I feel should be a very simple and straightforward document. I
believe the removal of that concept and the introduction of a non-normative
section describing the expected behaviour of Graph Stores would be the best
route forward.

It is also unclear what this document has to say about a central concept of
SPARQL: the dataset. I see in the change summary that the term Graph Store
was introduced to replace Dataset but I don't know the background to that
decision.

I would prefer to recast this whole document in the following way:

1. Introduce a Graph Store as a service that manages a collection of
datasets and a collection of graphs. Many Graph Stores will have a single
dataset, multi-tenant ones will have many.

2. Describe operations on a Graph Store: GET to obtain a document describing
the graph store including a link to the collection of datasets, a link to
the collection of graphs, links to provided services and links to other
configuration information

3. Describe operations on the Collection of Datasets: GET to obtain the list
of datasets, POST to append a new one

4. Describe operations on a Dataset: GET to obtain a list of graphs included
in the dataset, POST to include an existing graph in the dataset, PUT to
replace the definition, DELETE to remove a dataset

5. Describe operations on the collection of graphs: GET to obtain the list
of graphs, POST to append a new one

6. Describe operations on a Graph: GET to obtain a representation of the
graph, POST to append new data, PUT to replace data, DELETE to remove the
graph

7. Describe how graph stores may interpret graphs in particular ways,
treating datasets as more than a collection of individual graphs, i.e. RDF
Knowledge

I hope the group finds this feedback useful.

Best regards,

Ian
--
Ian Davis, Chief Technology Officer, Talis Group Ltd.
http://www.talis.com/ | Registered in UK and Wales as 5382297

I'm trialling Google Apps using a temporary email address. Email sent to my
usual address (ian.davis@talis.com) will still reach me.

Received on Wednesday, 15 December 2010 00:16:39 UTC