Re: sparql protocol simplex updated

On Thu, Dec 09, 2004 at 02:56:32PM +0000, Seaborne, Andy wrote:
> Kendall,
> 
> Good to see a new draft.  Seems to me to be going in the right direction 
> and could be published as a first WD as-is.

Thanks, Andy. I think I'll be ready to pub it as soon as I've thought
carefully about this pile of comments you've sent -- and made text
changes where appropriate.

> == Language specification
> 
> It would be good to be able to transport other query languages, existing and
> to come.  The abstract syntax for RDFGraphQuery does not contain a 
> specifier for
> the query language; guess/parsing may be insufficient (some RDQL is legal 
> SPARQL
> but the effect of SELECT is different).

I have language in the doc about this, but ran out of steam (or just
forgot) to tweak the abstract stuff and to show an example. I intend
to show iTQL and Versa examples before publishing again.

And, as you point out, the abstract protocol stuff needs to be tweaked
to contain this bit.

> I'd like to see a parameter in the abstract protocol with "lang=" in the 
> HTTP
> binding.  

Would you be as happy with "query-lang" so as not to be ambiguous with
lang=en/us? My preferred thing is to add a Sparql-QL-Type header
(since we want to identify query languages with URIs, putting them
into GET parameter just means that the URIs are (1) that much harder
to read; and (2) that much longer, risking the tipping point of "too
long for GET")...

Sparql-QL-Type: http://www.w3.org/Submission/RDQL

is preferable, IMO, to

GET /foo?query=...&query-lang=http%3A//www.w3.org/Submission/RDQL

(Generally, I don't see why people prefer GET params over headers,
since every HTTP tool I know of or use lets me set arbitrary headers
easily enough...)

Okay, another issue: should there be a default in the spec? That is,
if there's no query-lang bit (however it's serialized), should
everyone assume that the query is SPARQL? I would prefer to set some
kind of sensible default for several things, including this, but I'd
like to know what others think before writing that language.

> It becomes the only globally defined parameter and would free up 
> all
> other parameter names to be specific to the query language, not predefined 
> by
> this doc.  

Hmm...I'll chew on this a bit.

> (I think there are slight differences in "graph=" between the 
> SPARQL
> query and the getGraph query.)

I suspect there are, but I'm curious which ones you see?

> then the 3rd party form (ask a service for a named graph) of getGraph is:
> 
>        GET /qps?lang=getGraph&graph=...

Hmm, I still don't understand calling getGraph a *type* of query
language... Why not just specify in SPARQL that "SELECT *" means
"retrieve the graph"? I'd still want a protocol operation for
"retrieve graph", since that works orthogonally to any query language
*type*.

> while the 1st part version is still regular GET:
> 
>        GET /3.rdf
> 
> [*] except it should be a URI, not a short name.  But it won't fit them.

We agree that the value of this query language type thing is a URI,
yes? I'd be willing to do the work of contacting people responsible
for various languages to see if they'd give us URIs to use to identify
their QLs for use with our protocol. If possible, I think as many of
these as possible should be enumerated in the specification. Makes
client-writing *way* easier and should help bootstrap interop. (I
think 4 to 8 of the cover 90% of the usage...)

> This shows most readily in responses but the general point is that the
> abstraction can't cover all the details of a concrete binding.

Yep -- the response stuff, as you point out, is kinda muddled in the
present draft. It's muddled because I was torn between doing two
things:

1. overloading existing HTTP response codes for use in our slightly
   different domain

2. requiring responses have RDF graphs in their body (which is HTTP
   legal and even recommended, iirc), and letting those graphs carry
   the specialization information

I prefer (2), but the problem with it is the WG doesn't seem
especially interested in doing any vocabulary/schema work -- and there
are some tricky bits.

I'm very willing to work on such a solution, if anyone else is
interested, though I'm not gonna hold my breath. :>

> I suggest just covering the SPARQL errors, showing how they map to HTTP 
> response
> code and leave open that other HTTP response codes will occur and the same 
> ones
> may occur for other reasons.

Yes, a good deal of work remains to be done in the response codes. I'm
gonna punt on all of that till after the next publication.

> For example, in HTTP, errors can be because of HTTP issues or because of 
> SPARQL
> errors.  

Is there any reason not to put SPARQL error information into RDF
graphs contained in the HTTP response bodies? I mean, any information
other than "we don't have time"?

> 
> As it is correct to return 404 when the service just isn't there, what 
> happens
> in the other cases?

An RDF graph in the body specializes the general response code. Or
someting else...?

> Minor notes on response codes:
> 
> + Need to include 414 (Request-URI Too Long)!

Hmm, how did I miss that one? Dumb. Will definitely add it since
that's the signal to the client to use the alternate method for
conveying the query. Good catch, Andy.

> + Not sure 202 (Accepted) makes sense as query is request-response.  It
> certainly doesn't seem more important that some of the ones not mentioned.

A sign of my hubris. An early draft suggested an asynchronous response
to complex, long-running queries. But I chickened out, since that's
outside our brief. So, 202 is a leftover and should be dropped.

> This can be achieved efficiently in HTTP 1.1 by simply sending one request 
> after
> another. The TCP connection is almost always open so that the overhead is 
> just
> header parsing.

I thought long and hard about how to do "sessions" -- convey in one
HTTP transaction multiple queries where variables may or may not be
shared across them... I think Algae does this, but I couldn't think of
a clean way to do it since you only get one response code in HTTP, and
that makes the response code issues you raised above even *more*
complex.

One thing to do is apply the HTTP response type to the req-resp
transaction, and define SPARQL faults and error representations and
make them the representation of a faulty query request.

> Given it is possible in HTTP 1.1, I don't see the need to add another layer
> that can also do multiple queries per request.  I would be convinced by a 
> use
> case as to what capability is enabled.

Even with an ideal use case, it's *hard*, so I'm willing to drop it.

> == HTTP issues
> 
> Still need POST form for large queries.  Just using query-uri= does not work
> when firewalls are involved.

As I mentioned earlier, I have notes for this and will get it into the
doc ASAP.

> == Misc
> 
> What is the MINE type for N3? I found a quick survey in a IRC log which had 
> more
> application/n3 than text/n3 but significant amounts of both.  I found
> text/rdf+n3 from W3C yesterday.

I guessed! The N3 folks should sort this out, IMO. I try to avoid MIME
fights. MIME is horribly broken, IMO (witness the compound document
fiasco), and should be replaced by RDF or something useful.

> == HTTP Examples
> 
> What happens when there is no Accept: header?  I prefer this to mean:
> 
> application/xml;application/rdf+xml,q=0.9
> 
> so a SELECT returns XML by default.

Agreed. But the interaction between SPARQL query types and con-neg
should be expressed directly in a table or something, as well as in
examples. Examples are too often misinterpreted.

I think we had an email exchange where this all got spelled out, so
I'll find and use that for a first draft.

> Interactions: Do SPARQL-Distinct, SPARQL-Limit have the same meaning as in 
> query
> language? 

Yes.

>  What about interactions with HTTP mechanisms. I suggest leaving 
> these
>    out and avoiding interaction with concrete protocol mechanisms.

HTTP has headers called "Sparql-Distinct" and "Sparql-Limit"? What
interaction with HTTP could there be otherwise?

> SPARQL queries::
> 
> ex 1.2 query:
> What if SPARQL-Distinct, SPARQL-Limit don't apply.  Is it an error? I 
> suggest
> ignoring them.

Don't apply because not supported by the server in question? Or for
some other reason?

> Resources reference things not in the file - intended?
> 
> ex 1.3 query:
> What is the semantics of one query, 3 graphs?

I'm not sure because I've lost track of how or whether the query
language is doing queries against the merge of n graphs or how or
whether we're allowed to convey one query to be exected against n
graphs distinctly.

I intended 1.3 to be an example of the latter.

> I'd guess its three separate answers which suggests requests (and 3 response
> codes) on a single connection.  The second can be sent immediately, not 
> waiting
> for the first.

I thought one multipart/mime response, with each part containing the
query results.

> ex 1.4 query:
> Same comment about using HTTP one request-one response mode.

I'm not sure I'm ready to quit on this yet. Why not put the response
faults into the mime parts for each query? That way in HTTP the
response code applies only to the request-response cycle, and the
faults, errors or successes of multiple *queries* are represented in
the mime parts?

The use case I'm thinking of is my cell phone as a SemWeb client. It
wants to query the network for things, and it wants to do that as
efficiently as possible.

If no one else cares about this, we can drop it.

> Can we have multiple queries against multiple graphs? N*M queries or one 
> query per
> graph.

I couldn't decide how or what to say about that. But that being tricky
doesn't seem a reason to disallow the other forms per se.

> GetGraph::
> 
> Is it the presence of a "query=" parameter that distinguishes getGraph from
> a SPARQL query?  A lang= would make this explicit and would.

That's one way to distinguish them concretely in HTTP.

My problem is that conceptually "retrieve a graph" isn't a query
language type. At least, that doesn't make any sense to me. It makes
sense to say that "retrieve a graph or graph(s)" is a protocol
operation.

> ex 2.3 multipart/related?
> 
> RFC 2387 says:
> The Multipart/Related media type is intended for compound objects
> consisting of several inter-related body parts.
> 
> I don't see them as inter-related except that they are in the data for the 
> same
> response.

A typo. I had a hellish 3 hrs trying to get Python library to generate
multipart MIME bodies and just punted in the end.

> I intend to have more time.

Thanks for the implementation report, Andy. Very useful.

I'll try to get out a new draft, responding to many of the things in
this message, by late Friday my time.

Thanks, again.

Kendall Clark

Received on Thursday, 9 December 2004 15:58:10 UTC