Re: protocol draft available

On Wed, Nov 10, 2004 at 05:45:04PM -0000, Seaborne, Andy wrote:
> Great document and well explained.  There's a lot of material that is
> background that an implementer is going to need to be aware of.

Yes, and I realize some of it is more chatty/informational than a spec
needs; some of it might form basis for an Implementer's Guide or
something. 

> Do you have any thoughts on a SOAP binding?

Not yet. I've been thinking about working on a BEEP and a Unix shell
binding next.

> The query part is the main thing I think needs to be turned into a
> common protocol for DAWG - it is my access to someone else's data that I
> think is the most important use case.  That does not motivate a
> requirement for update to me.

Hmm. Ok. I just don't think add/deleteTriples is that difficult to
specify or test. Maybe I'm wrong. I don't see us going to CR in
January on the protocol anyway. No one's thinking that, are they?

> Comments on the query aspect:
> 
> 1/ I expected SELECT to return result sets always.  This seems most
> appropriate for REST because the SELECT query creates a web resource
> that is the result set and them sends back a representation document in
> XML or RDF/XML as requested by the MIME-type.  I expected the default
> (no MIME type) to XML.

Yes, the first version -- which I've updated already -- was broken in
this regard. I see things like this:

SELECT, no con-neg, returns Beckett's XML variable binding stuff
SELECT, con-neg for XML, returns Beckett's XML variable binding stuff
SELECT, con-neg for RDF, returns Jos's variable bindings as an RDF graph
CONSTRUCT, no con-neg, returns RDF subgraph
CONSTRUCT, XML con-neg, returns a fault (??)
CONSTRUCT, RDF con-neg, returns RDF subgraph
DESCRIBE, no con-neg, returns RDF subgraph
DESCRIBE, XML con-neg, returns a fault (??)
DESCRIBE, RDF con-neg, returns RDF subgraph
ASK, no con-neg, returns XML (basically: Dave's <results> with a
<true/> or <false/> child element)
ASK, XML con-neg, returns same as ASK no con-neg
ASK, RDF con-neg, returns some RDF graph describing a boolean

Con-neg for RDF serializations other than RDF/XML return that way of
spelling a graph: n-triples, n3, turtle, whatever.

I think that covers all of the cases. I intended to put this into the
spec, but I had to release it eventually and ran out of time. A
section clearly describing the QL-protocol interactions as to query
forms and result forms is totally necessary and fitting.

> I decoded example 1 as "SELECT ?a ?b ?c WHERE (?a ?b ?c)" and so I am
> guessing that the results are the RDF subgraph defined by this query.  
> 
> A query of "CONSTRUCT * WHERE (?a ?b ?c)" gives that possibility - could
> we not then have SELECT queries return result sets always?

Yes, absolutely. Was merely a brain warble on my part. I've fixed this
example in the public version.

> 2/ Requiring WSDL
> 
> My prototypical use case is that my client wants to query a server that
> I have just been told about (someone sent me the URL for the service in
> email).  My client knows nothing but the location of the service to
> query.
> 
> If someone had sent me a link to a web page, I'd just put it in my
> browser.  
> 
> I'd like it to be as simple as that to query a remote RDF dataset.
> Interoperability taken for granted.  Really easy to write a client (I
> agree that it's important that the client should be simple).

Well, my idea isn't to require the client to do *anything* with
WSDL. Nothing at all. Neither programmatic WSDL consumption nor
production is required by my scheme, which isn't well explained in the
doc, I grant.

I want to use WSDL 2, and the RDF mapping therefrom, to describe
SPARQL services with *RDF graphs*, in short. WSDL is just the source
of all the predicates, which have to come from somewhere.

The client grabs an RDF graph that tells it how to interact with the
service. My use of WSDL is in using it to describe servers/services,
then generating from that description the RDF graph that the client
requests and interacts with.

Bijan Parsia is working on WSDL 2 to RDF mapping, and I'm helping him,
starting to write a Python program to do this automagically. I think
we could even include a stub WSDL 2 description in the non-normative
appendices of the protocol spec.

So, no one is required to be able to process WSDL in order to be able
to do SPARQL.

W/out using WSDL, we have to come up with an RDF vocabulary for
describing SPARQL services/servers, and that's not only
time-consuming, but hard and dumb. :>

Or, we just hardcode everything. Ick.

> At the moment, it seems the client can't execute a request without
> finding the parameter names from the WSDL.  Having to do things just to
> find the query parameters feels like a burden on the client which needs
> to justify its value.  

No, the client has to retrieve, parse, and grok an RDF graph. I assume
all SPARQL clients will be able to do that, since most of them will be
adjuncts to some kind of existing RDF API joined with an HTTP lib.

> I don't see how WSDL adds i18n-ness because the
> client has to look something up in the WSDL - how do you tell the graph
> id from the query? And people don't read request URLs, machines do and
> machines don't care.  If we really must be neutral, call them "q", and
> "g".

Well, those are good candidates for hardcoded query parameters, but
this doesn't convince me in the least that hardcoding query parameters
is the right thing to do. In fact, I'm convinced, today, that it isn't.

> Having WSDL to describe the operation of query is great - but why
> require it and why get it on the critical path?  And where does it come
> from anyway?  

> It would be better to just take the service centric
> approach and make it the result of the GET on the service URL (no query
> string).  Then there are less points of failure.

well OPTIONS /service,  but I could support GET /service too.

My idea was something like this:

You or yr client finds a triple like (http://foo, rdf:type,
dawg:SparqlProcessor).

Yr client does

   OPTIONS *
   host:foo

and gets back either an RDF service description (which is derived from
but is not *WSDL*) or it gets back a URI

   http://foo/qsp

that it can then do:

   GET /qsp
   host:foo

or 

   OPTIONS /qsp
   host:foo

I favor OPTIONS, since the HTTP spec clearly says that's what it's
for. 

(You know, as an aside, reading the HTTP 1.1 spec carefully with
Semwebbery protocol needs in mind really gives the lie to this claim
that Berners-Lee really had machine-to-machine hypermedia navigation
in his head from the very beginning. If so, the HTTP spec gives
absolutely no evidence of that; in fact, it suggests the
opposite. Just my 2, irrelevant cents! :>)

Okay, so now the client has an RDF graph, and now it knows how to
invoke the operations it cares about on this SPARQL server.

Clients hardcode knowledge of particular predicates, but not
particular query names. That gives server-side implementers max
flexibility, and it also allows them to migrate
services/graphs/whatever, and clients won't all break.
 
> I think that DAWG choosing the parameter names, making it possible to
> write clients as simply as possible, is more valuable than flexibility.
> If there is a use case that motivates this flexibility, could you say
> what it is because I can't think of one.

Well, what are the use cases for machine-readable service descriptions
generally? And don't service descriptions generally include all the
implementation-specific details of how to invoke a service? I think
all of those use cases apply to SPARQL services. I can't see how they
don't, actually. 

Does the fact that my scheme only uses WSDL as a way to generate --
and not even at run or compile or operation-invocation time! -- an RDF
graph change yr mind at all?

I'm open to the possibility that I've described something more like
SPARQL 2.0, but I'd like to think not! :>

> 3/ Loading of arbitrary graphs
> 
> Having servers execute queries that pull in graphs from anywhere is
> dangerous - not only from denial-of-service attacks but just simple
> server system resource control.  For a client it is very convenient but,
> from a server's point of view, not knowing what the effect of retriveing
> a URL will be is bad.  Servers are more resource constrained.

Sure, and a server can always simply refuse to do it, right? But
disallowing any server and any server implementer from letting a
client do this seems wrong to me.

> In the FROM discussions, the debate settled on downplaying any
> implication that a system must load from URLs, making FROM a hint
> whereby a graph may come from a local cache or prefetched copy already
> in a collection of named graphs.

Really? I don't recall that.

> We at least need to sya whether this feature is expected and what 

Sentence cut off...

I say, I think, that no server is ever *required* to load any
arbitrary graph. But why prevent it from loading any arbitrary graphs
at all?

I'm thinking of stuff like FOAF files. If I want to provide a service
that does dynamic merge of arbitrary FOAF and runs a query on the
resulting graph, shouldn't the protocol allow that but not require
anyone else to do it? That's the point I'm aiming for. The present doc
may not say that, or you may not agree that it should say that. 

> 4/ Overlap with the SPARQL-the-query-language
> 
> As a QL, SPARQL can be used locally so it needs LIMIT and DISTINCT in
> the language.  It is useful to have 

Sentence breaks off here...

I thought we'd treat limit and distinct in the same way as FROM. In
the local case, they are what counts. In the protocol case, they are
hints to the protocol layer. Hence I put in headers to convey that
info in the HTTP protocol.

> 5/ Streaming
> 
> The XML result format is (going to be) designed for streaming.  RDF
> serializations could be used as a stream but in general aren't.  Why
> does the protocol need to ask for streaming?  

We have a design objective or requirement about streaming. I wasn't
sure we had that covered already, so I put in a bit about streaming
and a header in the HTTP. I'm cool w/ removing these if the XML format
alone covers our req/DO -- but for other protocols, like Jabber, that
are more streaming-friendly, it might make sense to leave streaming
in the abstract.

I care *most*, but not *only* about HTTP. Jabber is a good example of
a non-HTTP protocol I care about, and it's a lot more stream-centric
than HTTP, IIRC.

> Could it not be that XML
> results are streamed always (its a consequence of the format anyway as
> far as I can see).

Yes, that's probably the thing to do, I just wasn't sure. I will
remove the sparql-stream bit from the HTTP.

> 6/ Use of HTTP OPTION 
> 
> I used this is Joseki and it has been (politely) pointed out that this
> is an abuse of the verb.  I don't know whetehr it is or isn't.  As
> OPTIONs is deployed, there is danger in overloading its use.

Uh, that overloading danger applies to *all* the verbs, not just
OPTIONS, IMO. I.e., nothing special about OPTIONS.

>From RFC 2616,

   This method [OPTIONS] **allows the client to determine the options
   and/or requirements associated with a resource, or the capabilities
   of a server, without implying a resource action or initiating a
   resource retrieval.** (my emphasis)

That's almost *exactly* how I describe getOptions in the abstract and
OPTIONS in the HTTP protocols. Seems perfectly congruent to me.

> Comments on the overall approach:
> 
> A/ It's not clear as to why this is the right set of operations - indeed
> I'm not sure it is. There are alternative ways of approaching update,
> such as graph operators. 

<stupid-question>What are graph operators?</>

> The handling of bNodes needs to be consider.

Yes, I've so far punted all of that to you and Peter Patel Scheider! :>

> There needs to be at least need a "QueryDelete" operation which
> identifies a part of a graph and removes it - otherwise its not possible
> to get bNodes out. 

That's exactly what deleteTriples does, Andy. It takes a SPARQL query
and removes the told triples identified by the results of the
query. Or do you mean something different?

> B/ Update operations need to consider concurrent update -  there may
> need for either operation packing (more than one operation is an atomic
> request) or use of timestamps or locking.  

Yes, I got held up for a week trying to figure out how to send more
than 1 request in HTTP w/out going to a POST or complex multipart/mime
scheme. I gave up and decided that stuff was definitely 2.0.

But can't resource contention be totally opaque to the client? We can
do asynch callbacks in HTTP easily enough. Or just a kind of polling.

But generally I don't think that belongs in the spec; it seems totally
implementation-specific. Maybe I'm wrong?

> Example:
> 
> Change a FOAF record from old data to new data such that other client's
> don't see half changed data to old data as well as new data.

Sure. While updating, server locks that graph and doesn't answer
other operations against it; or it answers them with a URI at which
the operation results will eventually be avaiable, which leaves
clients free to poll that URI. HEADs are cheap.

> C/ getGraph
> 
> This seems to be overlapping with HTTP GET. An Apache server is a fine
> RDF server!  I can see that in a named collection of graphs, we might
> want to extract one of them but it is possible with 
> 
> CONSTRUCT * WHERE SOURCE <uri> ($x $y $z) 
> 
> subject to other discussions.
> 
> I guess I don't see the need to bring out this in a special operation
> when we have plain HTTP GET and languages to do it.

I was following Joseki here! :>

But, seriously, what about non-HTTP concrete protocols?

I really intend to use SPARQL over things other than HTTP. SMTP,
Jabber, and BEEP are likely candidates. All that stuff about
HTTP/Apache/GET are totally irrelevant in those cases, yeah?

Retrieve a graph seems the most primitive and useful protocol
operation of all. :>

Thanks for comments. I'll make some doc changes based on this, and I'm
happy to keep talking about it. I haven't reached any totally
unrevisable conclusions yet.

Kendall
-- 
Sometimes it's appropriate, even patriotic, to be ashamed
of your country. -- James Howard Kunstler

Received on Thursday, 11 November 2004 15:32:29 UTC