SPARQL query by reference in HTTP GET

Taking a first look at:
   http://www.w3.org/TR/2005/WD-rdf-sparql-protocol-20050114/

I'm a little concerned about the query form:
   GET 
/qps?graph=http%3A%2F%2Fmy.example%2F3.rdf&query-uri=http%3%2F%2Fmy.example%2Fquery.spql 
HTTP/1.1
[Section 1.2]

I can't definitively argue that this is incorrect, but it seems to me that 
this is likely to violate the principle of least surprise, as any change to 
the retrievable representation of the resource at the URI following 
'query-uri=' could lead to a completely different result with completely 
different meaning (not corresponding to the loosely defined notion of 
identity that is generally associated with the thing identified by a URI).

In a more practical vein, it seems to me that is is impossible for a 
typical system to know when it is safe to cache the response to such a 
query.  The server can't know as it doesn't know whether or not the query 
is subject to change, so cache-control directives don't help here.  The 
client can't know, because the queried resource may be very dynamic.

Thus, it seems to me that using HTTP GET is not really appropriate for this 
purpose, and nothing is lost by using POST instead.  Using POST has the 
added advantage that a complex query can be included in a request body, 
rather than in the URI (which I assume to be a motivation for the query-uri 
form).

(Of course, it would be nice to have a cacheable result for a complex 
query;  I think a server might, in some cases, (quite independently of the 
SPARQL HTTP binding specification) return a simplified query URI that is 
sufficient to retrieve the same result (cf. 
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14).)

....

Other comments:

Section 2, A:
[[
RDFGraph

RDFGraph is a canonically serialized RDF graph, that is, 
instanceOf(http://www.w3.org/1999/02/22-rdf-syntax-ns#). RDF/XML is the 
canonical way to serialize RDFGraph. There are other, semantically 
equivalent ways to serialize RDF graphs, and concrete protocols may allow 
for their use in addition to RDF/XML.
]]

RDF/XML does not give a unique serialization for a graph, so I don't think 
it's correct to call it canonical (or is at least confusing when one 
considers the purpose of so-called "Canonical XML").

....

[[
DistinctQueryResults

Some RDF query languages support the notion of distinct query results: 
duplicate query results are winnowed before being returned to the client. 
DistinctQueryResults is a key-value pair, distinct: boolean, where the 
value is a boolean: if true, distinct results must be returned to the 
client; if false, distinct results may not be returned to the client.
]]

This doesn't read very well.  The MAY NOT can be read (informally) to imply 
MUST NOT.  I think it would be clearer to say:  "if false, multiple copies 
of the same result MAY be returned to the client."

....

If the protocol is to include a getServiceDescription operation, then I 
think it is important for interoperability than something more must be said 
about the response value than simply that it is an RDF graph.  I think some 
(minimal) vocabulary of service description terms should be established to 
provide a basis for interoperability.  At least say what can be assumed 
about the service in the event that none of the assertions is the response 
graph are recognized by the client.

Without this, I think the service description query could simply be framed 
as another SPARQL query using vocabulary terms defined elsewhere, without 
any loss in functionality.  Arguably, that might in any case be a better 
way to do service queries.

....

Section 3/1.3

I don't think you really mean the MIME type of the response to be 
multipart/mime -- I think you may mean multipart/related (cf. example 
3/2.3).  IIRC, there is no multipart/mime content-type (I think there is an 
application/mime or message/mime).

....

Section 3:
[[
instanceOf(<anyURI>)

Indicates a well-formed, and optionally valid, instance of an XML 
vocabulary identified by <URI>. For example, 
instanceOf(http://www.w3.org/1999/02/22-rdf-syntax-ns#) indicates a 
well-formed instance of RDF/XML. Representation types other than XML may be 
expressed in the same notation even if they do not define well-formedness 
or validity or define it differently than XML.
]]

I find the use of 'vocabulary' confusing in this context, and not in accord 
with my understanding of XML or computer language terminology.  As written, 
I would expect the term to mean something like a term in an XML 
namespace.  Here, I think you mean something like "instance of an XML 
document with schema and/or DTD identified by <URI>" or "instance of an XML 
document with valid forms identified by <URI>".

....

[[
Operation Processor Service

     There are at least two ways to conceptualize and implement a SPARQL 
server in a heterogenous environment like the Web. The first way is 
graph-centric: the resources exposed to SPARQL operations are RDF graphs. 
The second way is service-centric: the primary resource exposed is a 
service that receives requests for SPARQL operations and responds 
accordingly. In this document a service-centric SPARQL server is known as 
an Operation Processor Service.
]]
(and elsewhere)

Hmmm... how does this relate to "OperationService" used previously.  They 
seem about the same.

Anyway, the distinction made between "Operation [Processor] Service" and 
"Operation Target" seems to be completely unnecessary as part of this 
specification, insofar as they reflect server implementation strategies and 
have no observable effect on interoperability between components that I can 
see.  At most, I think the comment quoted above should be a non-normative 
explanatory comment rather than being cause to introduce a domain specific 
term.

....

That's all for now.

#g


---
Graham Klyne
Image Bioinformatics Research Group
Department of Zoology, University of Oxford
South Parks Road, Oxford OX1 3PS, UK
E-mail: <Graham.Klyne@zoo.ox.ac.uk>
Direct phone: +44-(0)1865-281991
Departmental fax: +44-(0)1865-310447

Received on Tuesday, 5 April 2005 23:54:03 UTC