- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Thu, 11 Nov 2004 14:55:58 +0000
- To: kendall@monkeyfist.com
- CC: public-rdf-dawg@w3.org
Kendall, Great document and well explained. There's a lot of material that is background that an implementer is going to need to be aware of. Do you have any thoughts on a SOAP binding? The query part is the main thing I think needs to be turned into a common protocol for DAWG - it is my access to someone else's data that I think is the most important use case. That does not motivate a requirement for update to me and given the timescale, I feel a focus on query is the most valuable. Comments on the query aspect: 1/ I expected SELECT to return result sets always. This seems most appropriate for REST because the SELECT query creates a web resource that is the result set and them sends back a representation document in XML or RDF/XML as requested by the MIME-type. I expected the default (no MIME type) to XML. I decoded example 1 as "SELECT ?a ?b ?c WHERE (?a ?b ?c)" and so I am guessing that the results are the RDF subgraph defined by this query. A query of "CONSTRUCT * WHERE (?a ?b ?c)" gives that possibility - could we not then have SELECT queries return result sets always? 2/ Requiring WSDL My prototypical use case is that my client wants to query a server that I have just been told about (someone sent me the URL for the service in email). My client knows nothing but the location of the service to query. If someone had sent me a link to a web page, I'd just put it in my browser. I'd like it to be as simple as that to query a remote RDF dataset. Interoperability taken for granted. Really easy to write a client (I agree that it's important that the client should be simple). At the moment, it seems the client can't execute a request without finding the parameter names from the WSDL. Having to do things just to find the query parameters feels like a burden on the client which needs to justify its value. I don't see how WSDL adds i18n-ness because the client has to look something up in the WSDL - how do you tell the graph id from the query? And people don't read request URLs, machines do and machines don't care. If we really must be neutral, call them "q", and "g". Having WSDL to describe the operation of query is great - but why require it and why get it on the critical path? And where does it come from anyway? It would be better to just take the service centric approach and make it the result of the GET on the service URL (no query string). Then there are less points of failure. I think that DAWG choosing the parameter names, making it possible to write clients as simply as possible, is more valuable than flexibility. If there is a use case that motivates this flexibility, could you say what it is because I can't think of one. 3/ Loading of arbitrary graphs Having servers execute queries that pull in graphs from anywhere is dangerous - not only from denial-of-service attacks but just simple server system resource control. For a client it is very convenient but, from a server's point of view, not knowing what the effect of retriveing a URL will be is bad. Servers are more resource constrained. If you want low-footprint client's then proxy client's are a better way to go: send the request to the proxy which executes it for the cleint and sends the results back. In the FROM discussions, the debate settled on downplaying any implication that a system must load from URLs, making FROM a hint whereby a graph may come from a local cache or prefetched copy already in a collection of named graphs. We at least need to sya whether this feature is expected and what 4/ Overlap with the SPARQL-the-query-language As a QL, SPARQL can be used locally so it needs LIMIT and DISTINCT in the language. It is useful to have 5/ Streaming The XML result format is (going to be) designed for streaming. RDF serializations could be used as a stream but in general aren't. Why does the protocol need to ask for streaming? Could it not be that XML results are streamed always (its a consequence of the format anyway as far as I can see). 6/ Use of HTTP OPTION I used this is Joseki and it has been (politely) pointed out that this is an abuse of the verb. I don't know whetehr it is or isn't. As OPTIONs is deployed, there is danger in overloading its use. Comments on the overall approach: A/ It's not clear as to why this is the right set of operations - indeed I'm not sure it is. There are alternative ways of approaching update, such as graph operators. The handling of bNodes needs to be consider. There needs to be at least need a "QueryDelete" operation which identifies a part of a graph and removes it - otherwise its not possible to get bNodes out. User-feedback from Joseki is that this set of operations does not make the task of writing, say, an ontology editor that pushes and pulls fragments of a large ontology from a server easy. Several uses found the add/delete operations insufficient where ontologies have bNodes. B/ Update operations need to consider concurrent update - there may need for either operation packing (more than one operation is an atomic request) or use of timestamps or locking. Example: Change a FOAF record from old data to new data such that other client's don't see half changed data to old data as well as new data. C/ getGraph I guess I don't see the need to bring out this in a special operation when we have plain HTTP GET, a specific query language or language constructs to do it. This seems to be overlapping with HTTP GET. An Apache server is a fine RDF server! I can see that in a named collection of graphs, we might want to extract one of them but it is possible with CONSTRUCT * WHERE SOURCE <uri> ($x $y $z) subject to other discussions. Andy Kendall Clark wrote: > Les Chiens, > > I'm relieved (!) to say that I've finally got a protocol draft that > I'm willing to publicly share, in the event anyone's still > interested. You can find it at > > http://monkeyfist.com/kendall/sparql-protocol/ > > but that should be considered a temporary location, I suspect. > > If the primary consideration was that it fit on one sheet of paper, > then either I was the wrong person to work on this or it's just more > complex than that. Or both. :> > > I worked really hard to describe an abstract protocol that could be > realized in the Unix command-line environment, SOAP, HTTP, BEEP, and > other diverse environments, and that took a great deal of time. I > biased that abstract description in favor of HTTP, when things were > otherwise tricky, but I hope not too much. > > It probably doesn't need to be said, but this is a draft, it's full of > warts, bugs, and outright errors. I will continue to work on it pretty > much all the time, which means now that I'm sharing it, I'll add some > date/time/RCS-markers, so that changes are easier to detect. > > I hope that it helps, at the very least, focus our discussion and move > us toward CR status with all deliberate speed. > > Best, > Kendall Clark
Received on Thursday, 11 November 2004 14:56:37 UTC