Re: sparql protocol simplex updated from Seaborne, Andy on 2004-12-09 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 09 Dec 2004 23:33:41 +0000
To: kendall@monkeyfist.com
Cc: public-rdf-dawg@w3.org
Message-ID: <41B8E0D5.7040703@hp.com>
Kendall Clark wrote:
> On Thu, Dec 09, 2004 at 02:56:32PM +0000, Seaborne, Andy wrote:
> 
>>Kendall,
>>
>>Good to see a new draft.  Seems to me to be going in the right direction 
>>and could be published as a first WD as-is.
> 
> 
> Thanks, Andy. I think I'll be ready to pub it as soon as I've thought
> carefully about this pile of comments you've sent -- and made text
> changes where appropriate.
> 
> 
>>== Language specification
>>
>>It would be good to be able to transport other query languages, existing and
>>to come.  The abstract syntax for RDFGraphQuery does not contain a 
>>specifier for
>>the query language; guess/parsing may be insufficient (some RDQL is legal 
>>SPARQL
>>but the effect of SELECT is different).
> 
> 
> I have language in the doc about this, but ran out of steam (or just
> forgot) to tweak the abstract stuff and to show an example. I intend
> to show iTQL and Versa examples before publishing again.
> 
> And, as you point out, the abstract protocol stuff needs to be tweaked
> to contain this bit.
> 
> 
>>I'd like to see a parameter in the abstract protocol with "lang=" in the 
>>HTTP
>>binding.  
> 
> 
> Would you be as happy with "query-lang" so as not to be ambiguous with
> lang=en/us? My preferred thing is to add a Sparql-QL-Type header
> (since we want to identify query languages with URIs, putting them
> into GET parameter just means that the URIs are (1) that much harder
> to read; and (2) that much longer, risking the tipping point of "too
> long for GET")...

I'm not concerned with the name of the parameter (I was not planning on 
looking that often :-)  "query-lang" is better.

> 
> Sparql-QL-Type: http://www.w3.org/Submission/RDQL
> 
> is preferable, IMO, to
> 
> GET /foo?query=...&query-lang=http%3A//www.w3.org/Submission/RDQL
 >
 > (Generally, I don't see why people prefer GET params over headers,
 > since every HTTP tool I know of or use lets me set arbitrary headers
 > easily enough...)

The URI may be generated long before being applied to HTTP.  Bookmarks are 
like this - if we want it bookmarkable, then the parameters have to go in 
the URL.  A bookmark is a template with which to mint request instances.

Hmm - that implies SPARQL-limit and SPARQL-distinct as well.  Hadn't 
occurred to me before.

I will defer to experts in the WG and elsewhere on the use of HTTP but my 
reaction is that header parameters are for HTTP-level things, request 
instance things, and not for protocols on top of HTTP.  Hence, parameters 
for higher level should go in the request URI.

> 
> Okay, another issue: should there be a default in the spec? That is,
> if there's no query-lang bit (however it's serialized), should
> everyone assume that the query is SPARQL? I would prefer to set some
> kind of sensible default for several things, including this, but I'd
> like to know what others think before writing that language.

Either no default or default to SPARQL (it is the SPARQL protocol) work for me.

> 
> 
>>It becomes the only globally defined parameter and would free up 
>>all
>>other parameter names to be specific to the query language, not predefined 
>>by
>>this doc.  
> 
> 
> Hmm...I'll chew on this a bit.
> 
> 
>>(I think there are slight differences in "graph=" between the 
>>SPARQL
>>query and the getGraph query.)
> 
> 
> I suspect there are, but I'm curious which ones you see?
> 
> 
>>then the 3rd party form (ask a service for a named graph) of getGraph is:
>>
>>       GET /qps?lang=getGraph&graph=...
> 
> 
> Hmm, I still don't understand calling getGraph a *type* of query
> language... Why not just specify in SPARQL that "SELECT *" means
> "retrieve the graph"? I'd still want a protocol operation for
> "retrieve graph", since that works orthogonally to any query language
> *type*.

I don't understand the "orthogonality" here.

"CONSTRUCT * WHERE ( ?x ?y ?z )" retrieves the graph and is a query.  Query 
is retrieve some information and "retrieve graph" fits that for me.

> 
> 
>>while the 1st part version is still regular GET:
>>
>>       GET /3.rdf
>>
>>[*] except it should be a URI, not a short name.  But it won't fit them.
> 
> 
> We agree that the value of this query language type thing is a URI,
> yes? I'd be willing to do the work of contacting people responsible
> for various languages to see if they'd give us URIs to use to identify
> their QLs for use with our protocol.  If possible, I think as many of
> these as possible should be enumerated in the specification. Makes
> client-writing *way* easier and should help bootstrap interop. (I
> think 4 to 8 of the cover 90% of the usage...)

The nature of URIs is that you don't have to list them.  A fixed list will
become out of date very quickly.  Let evolution and search engines drive the
findability , especially as the client needs to speak language X and know a
server supports X in order to be able to sensible send an X request.  DanC's 
"external
metadata out there in the world" argument.

> 
> 
>>This shows most readily in responses but the general point is that the
>>abstraction can't cover all the details of a concrete binding.
> 
> 
> Yep -- the response stuff, as you point out, is kinda muddled in the
> present draft. It's muddled because I was torn between doing two
> things:
> 
> 1. overloading existing HTTP response codes for use in our slightly
>    different domain
> 
> 2. requiring responses have RDF graphs in their body (which is HTTP
>    legal and even recommended, iirc), and letting those graphs carry
>    the specialization information
> 
> I prefer (2), but the problem with it is the WG doesn't seem
> especially interested in doing any vocabulary/schema work -- and there
> are some tricky bits.

For SELECT queries, with XML results, there is no need for the requestor to 
have an RDF toolkit where the query may be passed in from another system 
(XSLT scripting is an example).  (2) makes that it requirement to have an 
RDF toolkit available to parse errors.

The example of XSLT/XQuery sending off an SELECT query and reformatting the 
results does not need an RDF parser.

> 
> I'm very willing to work on such a solution, if anyone else is
> interested, though I'm not gonna hold my breath. :>
> 
>>I suggest just covering the SPARQL errors, showing how they map to HTTP 
>>response
>>code and leave open that other HTTP response codes will occur and the same 
>>ones
>>may occur for other reasons.
> 
> 
> Yes, a good deal of work remains to be done in the response codes. I'm
> gonna punt on all of that till after the next publication.
> 
> 
>>For example, in HTTP, errors can be because of HTTP issues or because of 
>>SPARQL
>>errors.  
> 
> 
> Is there any reason not to put SPARQL error information into RDF
> graphs contained in the HTTP response bodies? I mean, any information
> other than "we don't have time"?

The toolkit issue.

> 
> 
>>As it is correct to return 404 when the service just isn't there, what 
>>happens
>>in the other cases?
> 
> 
> An RDF graph in the body specializes the general response code. Or
> someting else...?
> 
> 
>>Minor notes on response codes:
>>
>>+ Need to include 414 (Request-URI Too Long)!
> 
> 
> Hmm, how did I miss that one? Dumb. Will definitely add it since
> that's the signal to the client to use the alternate method for
> conveying the query. Good catch, Andy.
> 
> 
>>+ Not sure 202 (Accepted) makes sense as query is request-response.  It
>>certainly doesn't seem more important that some of the ones not mentioned.
> 
> 
> A sign of my hubris. An early draft suggested an asynchronous response
> to complex, long-running queries. But I chickened out, since that's
> outside our brief. So, 202 is a leftover and should be dropped.

Good - we don't need is a session layer.

> 
>>This can be achieved efficiently in HTTP 1.1 by simply sending one request 
>>after
>>another. The TCP connection is almost always open so that the overhead is 
>>just
>>header parsing.
> 
> 
> I thought long and hard about how to do "sessions" -- convey in one
> HTTP transaction multiple queries where variables may or may not be
> shared across them... I think Algae does this, but I couldn't think of
> a clean way to do it since you only get one response code in HTTP, and
> that makes the response code issues you raised above even *more*
> complex.
 >
> One thing to do is apply the HTTP response type to the req-resp
> transaction, and define SPARQL faults and error representations and
> make them the representation of a faulty query request.
> 
> 
>>Given it is possible in HTTP 1.1, I don't see the need to add another layer
>>that can also do multiple queries per request.  I would be convinced by a 
>>use
>>case as to what capability is enabled.
> 
> 
> Even with an ideal use case, it's *hard*, so I'm willing to drop it.
> 
> 
>>== HTTP issues
>>
>>Still need POST form for large queries.  Just using query-uri= does not work
>>when firewalls are involved.
> 
> 
> As I mentioned earlier, I have notes for this and will get it into the
> doc ASAP.
> 
> 
>>== Misc
>>
>>What is the MINE type for N3? I found a quick survey in a IRC log which had 
>>more
>>application/n3 than text/n3 but significant amounts of both.  I found
>>text/rdf+n3 from W3C yesterday.
> 
> 
> I guessed! The N3 folks should sort this out, IMO. I try to avoid MIME
> fights. MIME is horribly broken, IMO (witness the compound document
> fiasco), and should be replaced by RDF or something useful.
> 
> 
>>== HTTP Examples
>>
>>What happens when there is no Accept: header?  I prefer this to mean:
>>
>>application/xml;application/rdf+xml,q=0.9
>>
>>so a SELECT returns XML by default.
> 
> 
> Agreed. But the interaction between SPARQL query types and con-neg
> should be expressed directly in a table or something, as well as in
> examples. Examples are too often misinterpreted.
> 
> I think we had an email exchange where this all got spelled out, so
> I'll find and use that for a first draft.
> 
> 
>>Interactions: Do SPARQL-Distinct, SPARQL-Limit have the same meaning as in 
>>query
>>language? 
> 
> 
> Yes.
> 
> 
>> What about interactions with HTTP mechanisms. I suggest leaving 
>>these
>>   out and avoiding interaction with concrete protocol mechanisms.
> 
> 
> HTTP has headers called "Sparql-Distinct" and "Sparql-Limit"? What
> interaction with HTTP could there be otherwise?

HTTP has Range/Content-Range.  This is a way to bound the amount of data 
returned.  OK - in practice it's bytes but in theory it's "range units" like 
triples or rows.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.12

> 
> 
>>SPARQL queries::
>>
>>ex 1.2 query:
>>What if SPARQL-Distinct, SPARQL-Limit don't apply.  Is it an error? I 
>>suggest
>>ignoring them.
> 
> 
> Don't apply because not supported by the server in question? Or for
> some other reason?
> 
> 
>>Resources reference things not in the file - intended?
>>
>>ex 1.3 query:
>>What is the semantics of one query, 3 graphs?
> 
> 
> I'm not sure because I've lost track of how or whether the query
> language is doing queries against the merge of n graphs or how or
> whether we're allowed to convey one query to be exected against n
> graphs distinctly.
> 
> I intended 1.3 to be an example of the latter.
> 
> 
>>I'd guess its three separate answers which suggests requests (and 3 response
>>codes) on a single connection.  The second can be sent immediately, not 
>>waiting
>>for the first.
> 
> 
> I thought one multipart/mime response, with each part containing the
> query results.
> 
> 
>>ex 1.4 query:
>>Same comment about using HTTP one request-one response mode.
> 
> 
> I'm not sure I'm ready to quit on this yet. Why not put the response
> faults into the mime parts for each query? That way in HTTP the
> response code applies only to the request-response cycle, and the
> faults, errors or successes of multiple *queries* are represented in
> the mime parts?

err - because the response code goes back in the HTTP response and there is 
only one slot?  It is sent before the response headers and has to be known 
before the body can be sent.

Putting it in the MIME parts would be a different mechanism entirely (which 
is possible) so I suppose the HTTP response is 200.  Meets my concern of 
layering but raises another one :-)

If the reponse error is an RDF graph, how can the requestor tell it apart 
from a successful request that returns the same sort of graph (example: 
querying a log file for errors).

If its not a graph, then parsing the result is very hard when there are 
buffering streams.  Very hard to unparse (push backinto the input stream) 
and try again.

> 
> The use case I'm thinking of is my cell phone as a SemWeb client. It
> wants to query the network for things, and it wants to do that as
> efficiently as possible.
> 
> If no one else cares about this, we can drop it.
> 
> 
>>Can we have multiple queries against multiple graphs? N*M queries or one 
>>query per
>>graph.
> 
> 
> I couldn't decide how or what to say about that. But that being tricky
> doesn't seem a reason to disallow the other forms per se.
> 
> 
>>GetGraph::
>>
>>Is it the presence of a "query=" parameter that distinguishes getGraph from
>>a SPARQL query?  A lang= would make this explicit and would.
> 
> 
> That's one way to distinguish them concretely in HTTP.
> 
> My problem is that conceptually "retrieve a graph" isn't a query
> language type. At least, that doesn't make any sense to me. It makes
> sense to say that "retrieve a graph or graph(s)" is a protocol
> operation.
> 
> 
>>ex 2.3 multipart/related?
>>
>>RFC 2387 says:
>>The Multipart/Related media type is intended for compound objects
>>consisting of several inter-related body parts.
>>
>>I don't see them as inter-related except that they are in the data for the 
>>same
>>response.
> 
> 
> A typo. I had a hellish 3 hrs trying to get Python library to generate
> multipart MIME bodies and just punted in the end.
> 
> 
>>I intend to have more time.
> 
> 
> Thanks for the implementation report, Andy. Very useful.
> 
> I'll try to get out a new draft, responding to many of the things in
> this message, by late Friday my time.

Just publish whatever - it's better IMO to publish early, publish often. 
Get community opinion, not just the limited (by time, by numbers) WG opinion.

	Andy

> 
> Thanks, again.
> 
> Kendall Clark
Received on Thursday, 9 December 2004 23:34:12 UTC