Re: protocol draft available from Seaborne, Andy on 2004-11-11 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 11 Nov 2004 14:55:58 +0000
To: kendall@monkeyfist.com
CC: public-rdf-dawg@w3.org
Message-ID: <41937D7E.8070306@hp.com>
Kendall,

Great document and well explained.  There's a lot of material that is background 
that an implementer is going to need to be aware of.

Do you have any thoughts on a SOAP binding?


The query part is the main thing I think needs to be turned into a common 
protocol for DAWG - it is my access to someone else's data that I think is the 
most important use case.  That does not motivate a requirement for update to me 
and given the timescale, I feel a focus on query is the most valuable.


Comments on the query aspect:

1/ I expected SELECT to return result sets always.  This seems most appropriate 
for REST because the SELECT query creates a web resource that is the result set 
and them sends back a representation document in XML or RDF/XML as requested by 
the MIME-type.  I expected the default (no MIME type) to XML.

I decoded example 1 as "SELECT ?a ?b ?c WHERE (?a ?b ?c)" and so I am guessing 
that the results are the RDF subgraph defined by this query.

A query of "CONSTRUCT * WHERE (?a ?b ?c)" gives that possibility - could we not 
then have SELECT queries return result sets always?


2/ Requiring WSDL

My prototypical use case is that my client wants to query a server that I have 
just been told about (someone sent me the URL for the service in email).  My 
client knows nothing but the location of the service to query.

If someone had sent me a link to a web page, I'd just put it in my browser.

I'd like it to be as simple as that to query a remote RDF dataset. 
Interoperability taken for granted.  Really easy to write a client (I agree that 
it's important that the client should be simple).

At the moment, it seems the client can't execute a request without finding the 
parameter names from the WSDL.  Having to do things just to find the query 
parameters feels like a burden on the client which needs to justify its value. 
I don't see how WSDL adds i18n-ness because the client has to look something up 
in the WSDL - how do you tell the graph id from the query? And people don't read 
request URLs, machines do and machines don't care.  If we really must be 
neutral, call them "q", and "g".

Having WSDL to describe the operation of query is great - but why require it and 
why get it on the critical path?  And where does it come from anyway?  It would 
be better to just take the service centric approach and make it the result of 
the GET on the service URL (no query string).  Then there are less points of 
failure.

I think that DAWG choosing the parameter names, making it possible to write 
clients as simply as possible, is more valuable than flexibility.  If there is a 
use case that motivates this flexibility, could you say what it is because I 
can't think of one.


3/ Loading of arbitrary graphs

Having servers execute queries that pull in graphs from anywhere is dangerous - 
not only from denial-of-service attacks but just simple server system resource 
control.  For a client it is very convenient but, from a server's point of view, 
not knowing what the effect of retriveing a URL will be is bad.  Servers are 
more resource constrained.

If you want low-footprint client's then proxy client's are a better way to go: 
send the request to the proxy which executes it for the cleint and sends the 
results back.

In the FROM discussions, the debate settled on downplaying any implication that 
a system must load from URLs, making FROM a hint whereby a graph may come from a 
local cache or prefetched copy already in a collection of named graphs.

We at least need to sya whether this feature is expected and what


4/ Overlap with the SPARQL-the-query-language

As a QL, SPARQL can be used locally so it needs LIMIT and DISTINCT in the 
language.  It is useful to have

5/ Streaming

The XML result format is (going to be) designed for streaming.  RDF 
serializations could be used as a stream but in general aren't.  Why does the 
protocol need to ask for streaming?  Could it not be that XML results are 
streamed always (its a consequence of the format anyway as far as I can see).


6/ Use of HTTP OPTION

I used this is Joseki and it has been (politely) pointed out that this is an 
abuse of the verb.  I don't know whetehr it is or isn't.  As OPTIONs is 
deployed, there is danger in overloading its use.


Comments on the overall approach:

A/ It's not clear as to why this is the right set of operations - indeed I'm not 
sure it is. There are alternative ways of approaching update, such as graph 
operators. The handling of bNodes needs to be consider.  There needs to be at 
least need a "QueryDelete" operation which identifies a part of a graph and 
removes it - otherwise its not possible to get bNodes out.

User-feedback from Joseki is that this set of operations does not make the task 
of writing, say, an ontology editor that pushes and pulls fragments of a large 
ontology from a server easy.   Several uses found the add/delete operations 
insufficient where ontologies have bNodes.

B/ Update operations need to consider concurrent update -  there may need for 
either operation packing (more than one operation is an atomic request) or use 
of timestamps or locking.

Example:

Change a FOAF record from old data to new data such that other client's don't 
see half changed data to old data as well as new data.

C/ getGraph

I guess I don't see the need to bring out this in a special operation when we 
have plain HTTP GET, a specific query language or language constructs to do it.

This seems to be overlapping with HTTP GET. An Apache server is a fine RDF 
server! I can see that in a named collection of graphs, we might want to extract 
one of them but it is possible with

CONSTRUCT * WHERE SOURCE <uri> ($x $y $z)

subject to other discussions.

	Andy



Kendall Clark wrote:
> Les Chiens,
> 
> I'm relieved (!) to say that I've finally got a protocol draft that
> I'm willing to publicly share, in the event anyone's still
> interested. You can find it at
> 
> 	    http://monkeyfist.com/kendall/sparql-protocol/
> 
> but that should be considered a temporary location, I suspect.
> 
> If the primary consideration was that it fit on one sheet of paper,
> then either I was the wrong person to work on this or it's just more
> complex than that. Or both. :>
> 
> I worked really hard to describe an abstract protocol that could be
> realized in the Unix command-line environment, SOAP, HTTP, BEEP, and
> other diverse environments, and that took a great deal of time. I
> biased that abstract description in favor of HTTP, when things were
> otherwise tricky, but I hope not too much.
> 
> It probably doesn't need to be said, but this is a draft, it's full of
> warts, bugs, and outright errors. I will continue to work on it pretty
> much all the time, which means now that I'm sharing it, I'll add some
> date/time/RCS-markers, so that changes are easier to detect.
> 
> I hope that it helps, at the very least, focus our discussion and move
> us toward CR status with all deliberate speed.
> 
> Best,
> Kendall Clark
Received on Thursday, 11 November 2004 14:56:37 UTC