Re: sparql protocol simplex updated from Seaborne, Andy on 2004-12-09 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 09 Dec 2004 14:56:32 +0000
To: kendall@monkeyfist.com
CC: public-rdf-dawg@w3.org
Message-ID: <41B867A0.40206@hp.com>
Kendall,

Good to see a new draft.  Seems to me to be going in the right direction and 
could be published as a first WD as-is.

	Andy

== Language specification

It would be good to be able to transport other query languages, existing and
to come.  The abstract syntax for RDFGraphQuery does not contain a specifier for
the query language; guess/parsing may be insufficient (some RDQL is legal SPARQL
but the effect of SELECT is different).

I'd like to see a parameter in the abstract protocol with "lang=" in the HTTP
binding.  It becomes the only globally defined parameter and would free up all
other parameter names to be specific to the query language, not predefined by
this doc.  (I think there are slight differences in "graph=" between the SPARQL
query and the getGraph query.)

With a lang= parameter, then the requests would look like [*] this:

        GET /qps?lang=sparql&graph=...&query=...

then the 3rd party form (ask a service for a named graph) of getGraph is:

        GET /qps?lang=getGraph&graph=...

while the 1st part version is still regular GET:

        GET /3.rdf

[*] except it should be a URI, not a short name.  But it won't fit them.

== Layering

The abstract protocol tries to be prescriptive (it reads that way) but this
isn't a good approach when details of the concrete protocol show themselves.
This shows most readily in responses but the general point is that the
abstraction can't cover all the details of a concrete binding.

I suggest just covering the SPARQL errors, showing how they map to HTTP response
code and leave open that other HTTP response codes will occur and the same ones
may occur for other reasons.

For example, in HTTP, errors can be because of HTTP issues or because of SPARQL
errors.  The general HTTP response code should be non-normative: others can
occur like 502 (Gateway error) (and other 500's) and 302 (moved temporarily -
old style).  These are all lower level issues and the application will have to
deal with them as it sees fit.

(http://sparql.org/query.html generates 502 if you're are too quick after a
service restart - there is an Apache reverse proxy in front of the query server).

The meaning and reuse of response codes can also be tricky.

Example:
404 - What's not found?  The model?  One of the named graphs?  The service?

The HTTP spec says
"The server has not found anything matching the Request-URI."
and conventionally if the service is there, but a parameter is wrong (like
graph=) there would not be, from HTTP's point of view, a 404 error.

[Just found the great text "This status code is commonly used when the server
does not wish to reveal exactly why the request has been refused, or when no
other response is applicable."]

As it is correct to return 404 when the service just isn't there, what happens
in the other cases?  I think there is a confusion of levels going on between the
abstract service model and deployment environment.  The abstraction can not
define all situations that arise that are particular to a given deployment
environment.

Minor notes on response codes:

+ Need to include 414 (Request-URI Too Long)!

+ Not sure 202 (Accepted) makes sense as query is request-response.  It
certainly doesn't seem more important that some of the ones not mentioned.

== Multiple Operations per Request

This can be achieved efficiently in HTTP 1.1 by simply sending one request after
another. The TCP connection is almost always open so that the overhead is just
header parsing.

The advantage of one query-re-request is that response codes are clearly
associated with the actions.  One query - one response.  Otherwise, some
queries work, others don't so each needs a response somehow.

Given it is possible in HTTP 1.1, I don't see the need to add another layer
that can also do multiple queries per request.  I would be convinced by a use
case as to what capability is enabled.

== HTTP issues

Still need POST form for large queries.  Just using query-uri= does not work
when firewalls are involved.

== Misc

What is the MINE type for N3? I found a quick survey in a IRC log which had more
application/n3 than text/n3 but significant amounts of both.  I found
text/rdf+n3 from W3C yesterday.

== HTTP Examples

What happens when there is no Accept: header?  I prefer this to mean:

application/xml;application/rdf+xml,q=0.9

so a SELECT returns XML by default.

Interactions: Do SPARQL-Distinct, SPARQL-Limit have the same meaning as in query
language?  What about interactions with HTTP mechanisms. I suggest leaving these
    out and avoiding interaction with concrete protocol mechanisms.

There is going to be interactions between graph= and FROM/GRAPH/SOURCE.

SPARQL queries::

ex 1.2 query:
What if SPARQL-Distinct, SPARQL-Limit don't apply.  Is it an error? I suggest
ignoring them.

Resources reference things not in the file - intended?

ex 1.3 query:
What is the semantics of one query, 3 graphs?
I'd guess its three separate answers which suggests requests (and 3 response
codes) on a single connection.  The second can be sent immediately, not waiting
for the first.

ex 1.4 query:
Same comment about using HTTP one request-one response mode.

Can we have multiple queries against multiple graphs? N*M queries or one query per
graph.

GetGraph::

Is it the presence of a "query=" parameter that distinguishes getGraph from
a SPARQL query?  A lang= would make this explicit and would.

ex 2.3 multipart/related?

RFC 2387 says:
The Multipart/Related media type is intended for compound objects
consisting of several inter-related body parts.

I don't see them as inter-related except that they are in the data for the same
response.

== Implementation Experience

http://www.sparlq.org/query.html is a HTML form front to a service at
http://www.sparlq.org/books.  It is built on ARQ, Joseki3 using Jetty as a
servlet container.  Joseki3 is a bit rough in places because I expect to need to
make changes as the protocol emerges.  This mainly effects the configuration
file that effects the service run and that is very user-visible.  Joseki uses an
RDF config file (N3 usually); it supports multiple query languages and each can
have its own parameters.  Queries come over GET or POST (in an RDF graph with
large literal for the query string).

The supported parameters are lang= and query= Queries are against a single fixed
graph and graph= (single or multiple) and query-uri= are not handled.  I'm not
keen on loading arbitrary web resources into a general service processor so I
see thatas an optional feature and woudl make it configurable (default off).

All four result forms are handled, including XML and RDF/XML results for SELECT.
   Content negotiation is done (one hack - it snoops to see if a browser is
asking by trying to see is "text" is requested - if so, you get text/plain and
N3 so it displays without kicking off a helper app).

N3 uses MIME type is application/n3.

I intend to make it exactly the SPARQL specification and have an "exact" mode.
I will also make it possible for the deployer to restrict features.

I intend to make it more service-like when the relationship of protocol
parameters and query language features is clearer.  The biggest outstanding
issue is FROM/GRAPH and "graph=" (because I don't see the multiple requests as
being the best way to do it).

I intend to do a SOAP interface.  The main issue I can see is keeping the
abstraction of query engine yet handling RDF/XML & XML results cleanly.

I intend to have more time.

Kendall Clark wrote:
> Folks,
> 
> Please find
> 
> DRAFT: $Id: protocol-wd.html,v 1.8 2004/12/06 19:22:10 k Exp $
> 
> at
> 
> http://monkeyfist.com/kendall/sparql-protocol-simplex/
> 
> Notable changes include more excision of unnecessary parts and some
> major reorganizations of the remaining bits. The result is much
> shorter and simpler. I also renamed HTTP query parameters: "q" ->
> "query"; "g" -> "graph"; "q-uri" -> "query-uri". 
> 
> There's still plenty of substantive work to be done in sorting out
> details, but I'd to find out whether this is likely to become a WG
> working draft before doing much more of that.
> 
> Kendall Clark
>
Received on Thursday, 9 December 2004 14:57:13 UTC