Re: Comments on SPARQL 1.1 Uniform HTTP Protocol Working Draft 14 October 2010 from Chimezie Ogbuji on 2011-01-25 (public-rdf-dawg-comments@w3.org from January 2011)

From: Chimezie Ogbuji <chimezie@gmail.com>
Date: Tue, 25 Jan 2011 11:49:24 -0500
To: Ian Davis <ian.davis@talis.com>, public-rdf-dawg-comments@w3.org
Message-ID: <AANLkTikT5jqAm=2yJfEggJY_u8H-xkcs-_4WU4oNXbmF@mail.gmail.com>
Hello, Ian. Thanks for your comments, see the response(s) below (in
context).  Please indicate if this response addresses your concerns
and feel free to give further feedback beyond the point where you
stopped if it does.

On Tue, Dec 15, 2010 at 12:16 AM, Ian Davis <ian.davis@talis.com> wrote:
> ..snip ..
> Note also that in my comments I use the word "represent" and
> "representation" only in the sense as defined by rfc2616.

> Section 2
> Graph Store is defined to be mutable. I don't see why it needs that
> requirement. The read only aspects of this document could apply to a
> non-mutable Graph Store

This definition is the same as that in the SPARQL 1.1 Update document.
In both cases, it is necessary in order for the graphs to be subject
to all operations (beyond those that are idempotent).

> Section 4.1

> I don't at all understand the need for the distinction in this document
> between a graph and RDF knowledge. I find the supplied explanation
> particularly confusing:
> "we are not directly identifying an RDF graph but rather the RDF knowledge
> that is represented by an RDF document, which serializes that graph"

Note, the phrase "that is represented by an RDF document, which
serializes that graph" is from the 1.0 SPARLQ specification: 8.2.2
Specifying Named Graphs (last paragraph) [1]. The following entry has
been added to the terminology section (of the editor's draft, for
incorporation into the next publication) [2] for clarification:

Serialize (verb.) - When used in a sentence where the subject is an
RDF document and the object is an RDF graph, this is understood to
mean that the result of parsing the document is the graph.

Also.  As a result of internal discussion and comments regarding this
term, the current editor's draft replaces 'RDF knowledge' with 'RDF
graph content' and I will be using this latter terminology in
subsequent parts of this email.

> I have seen serialization and representation used interchangeably in many
> REST discussions but never seen them used as distinct operations so I don't
> know what to make of it really.

The word serialization is meant to be used in the sense having to do
with parsing. Hopefully, this terminology clarification addresses your
concern.

> If my understanding of the terminology is correct than I think the
> relationships are that RDF Knowledge is the result of interpreting an RDF
> graph which may be represented by an RDF document. In this case the
> identified resource that is emitting representations is the graph itself.

This is not the case, and the section mentioned above in the original
SPARQL 1.0 specification makes this clear (and is the primary reason
why this distinction is emphasized): "[...] the relationship between
an IRI and a graph in an RDF dataset is indirect. The IRI identifies a
resource, and the resource is represented by a graph (or, more
precisely: by a document that [...]"

The graph IRI identifies a resource that "emits" representations
(serializations of a graph as RDF documents). The relationship between
a named graph and its IRI is part of the definition of a dataset,
however the relationship between what the graph IRI identifies and the
graph is only briefly described above. This specification uses the
term RDF graph content to attempt to build on this and provide an
intuitive understanding of the relationship between that resource and
the graph as a framework behind a RESTful abstraction of an RDF
dataset. The (informal) intuition is that the graph IRIs identify the
meaning of the graph and RDF-MT provides a relationship between an RDF
graph and its meaning (interpretation).

> The RDF Knowledge is not explicitly named here, but could be somehow.

> The immediately following sentence "Intuitively, the interpetations that
> satisfy [RDF-MT] the RDF graph serialized by the RDF document can be thought
> of as this RDF knowledge" implies that the Graph IRI identifies multiple
> things, i.e. multiple interpretations. It's axiomatic on the web that a URI
> (IRI) identifies only one resource so I see this as a conflict.

In the editor's draft this has been changed [3] to: "Intuitively, the
set of interpetations that [...]". This is meant to be an informal
characterization and the idea is that all interpretations that satisfy
the graph comprise the (machine-understandable) meaning that the graph
IRI identifies, since they all have in common the fact that they
adhere to the logical constraints in the vocabulary and the structure
of the graph.

> I assume the introduction of the term "RDF Knowledge" is motivated by an
> attempt to unify the concept of distinct document-like resources that you
> encounter on the web and an aggregation of the data in those documents as
> you might find in a database. I think this document would benefit from the
> removal of that term entirely

As mentioned above, the term has been replaced.

> and the addition of a section describing how a
> Graph Store might aggregate and interpret the graphs to form one or more
> datasets that may be accessed with zero or more SPARQL or other services.

Can you elaborate on how this description of a Graph Store is
different from the common one used by this protocol and the SPARQL 1.1
Update language and why this difference is important?

> Section 4.2
> The diagram implies that the encoded URI (e.g.
> http://www.example.org/other/graph) and the indirect URI
> http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/graphidentify
> they same RDF Knowledge. Does this imply this triple:
>  <http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/graph>
> <http://www.w3.org/2002/07/owl#sameAs>
> <http://www.example.org/other/graph> .

Yes. This allows (simultaneously):

* Clarity regarding the REST principle of identification ("REST uses a
resource identifier to identify the particular resource involved in an
interaction between components.")
* Clarity regarding the notion of the scopes of the various parts of a
URI (the fragment, the query component, the path, etc.) as defined in
RFC 3986, which states that data in the query component further
distinguishes which resource (within the scope of the naming authority
and path) is being identified
* The ability to use HTTP to manipulate graphs in a graph store that
are not accessible for various reasons (the most probable being that
their IRIs are not resolvable)

> I think the whole notion of indirect identification is problematic. What the
> document is saying, in essence, is that if you have the URI of a graph you
> need to discover some other URI by an unspecified mechanism with which to
> manipulate it.

Recent changes in the editor's draft (see the end of section 4.2)
clarify that (in the case of indirect identification), the part of the
URI prior to the query component is the URL of the service itself and
so it is reasonable to assume that the client knows this URL a priori.

> If you discover multiple such URIs would you be justified in
> assuming that they all manipulate the same underlying graph?

The graphs manipulated via their IRIs are scoped to a graph store
which is scoped to the service, so there will only be one such
(service) URL to discover in order to (indirectly) manipulate the
named graphs within via the use of the ?graph= query component.

> I am not convinced that it is intuitive that the following identify
> different graphs that have the same URI

> http://foo.com/graphs?graph=http%3A//www.example.org/other/graph
> http://bar.org/rdf-data?graph=http%3A//www.example.org/other/graph

The service URLs are different and so the HTTP requests that use these
as their request URIs would be manipulating the meaning of graphs that
exist in separate stores. However, the embedded URIs are the same in
both cases, so (since URI identification is a functional relationship)
they would be accessing the same RDF knowledge on different stores.
Whether or not those stores and the services they are a part of do
indeed treat them as the same (by mirroring, for example) is outside
the scope of this protocol. Can you further elaborate how this is not
intuitive?

> Furthermore, should a conformant server that supports multiple independent
> collections of graphs (e.g. Talis Platform) be required to enforce that
> graph URIs identify the same knowledge across all the collections?

This is beyond the scope of this protocol (which only specifies
operations on a *single* graph store), however, I would think that
such a server would need to or it would be at odds with the AWWW and
REST.

> In other words are the following required to manipulate the same "RDF Knowledge":

This is not required by this protocol, but by what is dictated by the
URI specification

> http://api.talis.com/dataset1/graphs?
> graph=http%3A//www.example.org/other/graph<http://ex1.com/g?graph=http%3A//www.example.org/other/graph>
> http://api.talis.com/dataset2/graphs?
> graph=http%3A//www.example.org/other/graph<http://ex1.com/g?graph=http%3A//www.example.org/other/graph>

> The following sentence implies that this is the case: "Any server that
> implements this protocol and receives a request URI in this form SHOULD
> invoke the indicated operation on the RDF knowledge identified by the URI
> embedded in the query component where the URI is the result of
> percent-decoding the value associated with the graph key."

Given, what I've said about scoping, would the following modification
address your concerns?:

"[...] SHOULD invoke the indicated operation on the RDF graph content
(in the underlying graph store) identified by the URI embedded [...]"

> At this point I stopped my review. That the two areas I explored are
> complicated excessively by the introduction of the RDF Knowledge concept
> into what I feel should be a very simple and straightforward document.

See the earlier point about the role of 'RDF graph content' with
respect to clarifying what is identified by a graph IRI (which is
necessary as part of a REST protocol model for an RDF dataset).

> I believe the removal of that concept and the introduction of a non-normative
> section describing the expected behaviour of Graph Stores would be the best
> route forward.

In light of the above response, can you clarify how the expected
behavior of Graph Stores is not already covered by the this
specification?

> It is also unclear what this document has to say about a central concept of
> SPARQL: the dataset. I see in the change summary that the term Graph Store
> was introduced to replace Dataset but I don't know the background to that
> decision.

This was done to unify with the notion of a Graph Store that is common
to this specification and the SPARQL Update language specification;
both of which specify operations that change (or replace a graph).

> I would prefer to recast this whole document in the following way:

> 1. Introduce a Graph Store as a service that manages a collection of
> datasets and a collection of graphs. Many Graph Stores will have a single
> dataset, multi-tenant ones will have many.

Currently, this specification and the SPARQL Update specification
distinguish between a 'service' and a Graph Store and there is roughly
a 1-1 correspondence between a Graph store and a a dataset. Can you
elaborate on use cases where a single service would need to manage
multiple collections of graphs (i.e., multiple datasets or graph
stores)?

> 2. Describe operations on a Graph Store: GET to obtain a document describing
> the graph store including a link to the collection of datasets, a link to
> the collection of graphs, links to provided services and links to other configuration information

In the most recent batch of changes, HTTP OPTIONS / GET on the service
URL returns a service description which includes much of the items you
have listed.

> 3. Describe operations on the Collection of Datasets: GET to obtain the list
> of datasets, POST to append a new one

The HTTP POST operation already specifies how to append a new graph

> 4. Describe operations on a Dataset: GET to obtain a list of graphs included
> in the dataset, POST to include an existing graph in the dataset, PUT to
> replace the definition, DELETE to remove a dataset

It is not clear to me how this is different from what is already
currently specified with the exception that the target in your case is
the dataset rather than the resource identified by the graph IRI (the
RDF knowledge)

The same questions are relevant to the remaining items in your list.

Thanks

-- Chime

[1] http://www.w3.org/TR/rdf-sparql-query/#namedGraphs
[2] http://www.w3.org/2009/sparql/docs/http-rdf-update/#terminology
[3] http://www.w3.org/2009/sparql/docs/http-rdf-update/#direct-graph-identification
Received on Tuesday, 25 January 2011 16:58:45 UTC