Re: protocol draft available from Seaborne, Andy on 2004-11-12 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 12 Nov 2004 12:02:39 +0000
To: kendall@monkeyfist.com
CC: public-rdf-dawg@w3.org
Message-ID: <4194A65F.8090401@hp.com>
Kendall Clark wrote:
> On Wed, Nov 10, 2004 at 05:45:04PM -0000, Seaborne, Andy wrote:
> 
>>Great document and well explained.  There's a lot of material that is
>>background that an implementer is going to need to be aware of.
> 
> 
> Yes, and I realize some of it is more chatty/informational than a spec
> needs; some of it might form basis for an Implementer's Guide or
> something. 
> 
> 
>>Do you have any thoughts on a SOAP binding?
> 
> 
> Not yet. I've been thinking about working on a BEEP and a Unix shell
> binding next.
> 
> 
>>The query part is the main thing I think needs to be turned into a
>>common protocol for DAWG - it is my access to someone else's data that I
>>think is the most important use case.  That does not motivate a
>>requirement for update to me.
> 
> 
> Hmm. Ok. I just don't think add/deleteTriples is that difficult to
> specify or test. Maybe I'm wrong.

There is a major alternative to the add/deleteTriples operations in the protocol 
which woudl be to extend the query langauge with an update language (c.f. SQL). 
  This has the advantage of working in the local case as well.

I see query as being more clear cut and doable for last call with the query 
language.


> I don't see us going to CR in
> January on the protocol anyway. No one's thinking that, are they?
> 
> 
>>Comments on the query aspect:
>>
>>1/ I expected SELECT to return result sets always.  This seems most
>>appropriate for REST because the SELECT query creates a web resource
>>that is the result set and them sends back a representation document in
>>XML or RDF/XML as requested by the MIME-type.  I expected the default
>>(no MIME type) to XML.
> 
> 
> Yes, the first version -- which I've updated already -- was broken in
> this regard. I see things like this:
> 
> SELECT, no con-neg, returns Beckett's XML variable binding stuff
> SELECT, con-neg for XML, returns Beckett's XML variable binding stuff
> SELECT, con-neg for RDF, returns Jos's variable bindings as an RDF graph
> CONSTRUCT, no con-neg, returns RDF subgraph
> CONSTRUCT, XML con-neg, returns a fault (??)

Returns RDF/XML !  It is XML after all.

> CONSTRUCT, RDF con-neg, returns RDF subgraph
> DESCRIBE, no con-neg, returns RDF subgraph
> DESCRIBE, XML con-neg, returns a fault (??)
> DESCRIBE, RDF con-neg, returns RDF subgraph
> ASK, no con-neg, returns XML (basically: Dave's <results> with a
> <true/> or <false/> child element)
> ASK, XML con-neg, returns same as ASK no con-neg
> ASK, RDF con-neg, returns some RDF graph describing a boolean
> 
> Con-neg for RDF serializations other than RDF/XML return that way of
> spelling a graph: n-triples, n3, turtle, whatever.
> 
> I think that covers all of the cases. I intended to put this into the
> spec, but I had to release it eventually and ran out of time. A
> section clearly describing the QL-protocol interactions as to query
> forms and result forms is totally necessary and fitting.
> 
> 
>>I decoded example 1 as "SELECT ?a ?b ?c WHERE (?a ?b ?c)" and so I am
>>guessing that the results are the RDF subgraph defined by this query.  
>>
>>A query of "CONSTRUCT * WHERE (?a ?b ?c)" gives that possibility - could
>>we not then have SELECT queries return result sets always?
> 
> 
> Yes, absolutely. Was merely a brain warble on my part. I've fixed this
> example in the public version.
> 
> 
>>2/ Requiring WSDL
>>
>>My prototypical use case is that my client wants to query a server that
>>I have just been told about (someone sent me the URL for the service in
>>email).  My client knows nothing but the location of the service to
>>query.
>>
>>If someone had sent me a link to a web page, I'd just put it in my
>>browser.  
>>
>>I'd like it to be as simple as that to query a remote RDF dataset.
>>Interoperability taken for granted.  Really easy to write a client (I
>>agree that it's important that the client should be simple).
> 
> 
> Well, my idea isn't to require the client to do *anything* with
> WSDL. Nothing at all. Neither programmatic WSDL consumption nor
> production is required by my scheme, which isn't well explained in the
> doc, I grant.
> 
> I want to use WSDL 2, and the RDF mapping therefrom, to describe
> SPARQL services with *RDF graphs*, in short. WSDL is just the source
> of all the predicates, which have to come from somewhere.
> 
> The client grabs an RDF graph that tells it how to interact with the
> service. My use of WSDL is in using it to describe servers/services,
> then generating from that description the RDF graph that the client
> requests and interacts with.
> 
> Bijan Parsia is working on WSDL 2 to RDF mapping, and I'm helping him,
> starting to write a Python program to do this automagically. I think
> we could even include a stub WSDL 2 description in the non-normative
> appendices of the protocol spec.
> 
> So, no one is required to be able to process WSDL in order to be able
> to do SPARQL.
> 
> W/out using WSDL, we have to come up with an RDF vocabulary for
> describing SPARQL services/servers, and that's not only
> time-consuming, but hard and dumb. :>
> 
> Or, we just hardcode everything. Ick.

Two points:

1/ The client may not be able to parse RDF - suppose the client is only 
interested in SELECT result forms in XML.  The client does not been an RDF 
subsystem to do that.

2/ Just because it is RDF, not WSDL2 in XML, does not removeteh fact that the 
client is doing more that just issuing a query request.  It is still having to 
process some other information to find the query parameters.  When I talked 
about "processing WSDL" I wasn't just thinking about the syntactic form of XML 
or RDF; I was thinking about the need for the step of finding the paramter names 
at all.

I still don't see why this help i18n-ness - what is looked up in the RDF-ised 
RDF?  At some point, that vocabulary is agreed and proedicate names, like 
request parameter names, have to have spellings.

> 
> 
>>At the moment, it seems the client can't execute a request without
>>finding the parameter names from the WSDL.  Having to do things just to
>>find the query parameters feels like a burden on the client which needs
>>to justify its value.  
> 
> 
> No, the client has to retrieve, parse, and grok an RDF graph. I assume
> all SPARQL clients will be able to do that, since most of them will be
> adjuncts to some kind of existing RDF API joined with an HTTP lib.

See above - I don't see all clients having an RDF API.

(no more below)

	Andy

> 
> 
>>I don't see how WSDL adds i18n-ness because the
>>client has to look something up in the WSDL - how do you tell the graph
>>id from the query? And people don't read request URLs, machines do and
>>machines don't care.  If we really must be neutral, call them "q", and
>>"g".
> 
> 
> Well, those are good candidates for hardcoded query parameters, but
> this doesn't convince me in the least that hardcoding query parameters
> is the right thing to do. In fact, I'm convinced, today, that it isn't.
> 
> 
>>Having WSDL to describe the operation of query is great - but why
>>require it and why get it on the critical path?  And where does it come
>>from anyway?  
> 
> 
>>It would be better to just take the service centric
>>approach and make it the result of the GET on the service URL (no query
>>string).  Then there are less points of failure.
> 
> 
> well OPTIONS /service,  but I could support GET /service too.
> 
> My idea was something like this:
> 
> You or yr client finds a triple like (http://foo, rdf:type,
> dawg:SparqlProcessor).
> 
> Yr client does
> 
>    OPTIONS *
>    host:foo
> 
> and gets back either an RDF service description (which is derived from
> but is not *WSDL*) or it gets back a URI
> 
>    http://foo/qsp
> 
> that it can then do:
> 
>    GET /qsp
>    host:foo
> 
> or 
> 
>    OPTIONS /qsp
>    host:foo
> 
> I favor OPTIONS, since the HTTP spec clearly says that's what it's
> for. 
> 
> (You know, as an aside, reading the HTTP 1.1 spec carefully with
> Semwebbery protocol needs in mind really gives the lie to this claim
> that Berners-Lee really had machine-to-machine hypermedia navigation
> in his head from the very beginning. If so, the HTTP spec gives
> absolutely no evidence of that; in fact, it suggests the
> opposite. Just my 2, irrelevant cents! :>)
> 
> Okay, so now the client has an RDF graph, and now it knows how to
> invoke the operations it cares about on this SPARQL server.
> 
> Clients hardcode knowledge of particular predicates, but not
> particular query names. That gives server-side implementers max
> flexibility, and it also allows them to migrate
> services/graphs/whatever, and clients won't all break.
>  
> 
>>I think that DAWG choosing the parameter names, making it possible to
>>write clients as simply as possible, is more valuable than flexibility.
>>If there is a use case that motivates this flexibility, could you say
>>what it is because I can't think of one.
> 
> 
> Well, what are the use cases for machine-readable service descriptions
> generally? And don't service descriptions generally include all the
> implementation-specific details of how to invoke a service? I think
> all of those use cases apply to SPARQL services. I can't see how they
> don't, actually. 
> 
> Does the fact that my scheme only uses WSDL as a way to generate --
> and not even at run or compile or operation-invocation time! -- an RDF
> graph change yr mind at all?
> 
> I'm open to the possibility that I've described something more like
> SPARQL 2.0, but I'd like to think not! :>
> 
>>3/ Loading of arbitrary graphs
>>
>>Having servers execute queries that pull in graphs from anywhere is
>>dangerous - not only from denial-of-service attacks but just simple
>>server system resource control.  For a client it is very convenient but,
>>from a server's point of view, not knowing what the effect of retriveing
>>a URL will be is bad.  Servers are more resource constrained.
> 
> 
> Sure, and a server can always simply refuse to do it, right? But
> disallowing any server and any server implementer from letting a
> client do this seems wrong to me.
> 
> 
>>In the FROM discussions, the debate settled on downplaying any
>>implication that a system must load from URLs, making FROM a hint
>>whereby a graph may come from a local cache or prefetched copy already
>>in a collection of named graphs.
> 
> 
> Really? I don't recall that.
> 
> 
>>We at least need to sya whether this feature is expected and what 
> 
> 
> Sentence cut off...
> 
> I say, I think, that no server is ever *required* to load any
> arbitrary graph. But why prevent it from loading any arbitrary graphs
> at all?
> 
> I'm thinking of stuff like FOAF files. If I want to provide a service
> that does dynamic merge of arbitrary FOAF and runs a query on the
> resulting graph, shouldn't the protocol allow that but not require
> anyone else to do it? That's the point I'm aiming for. The present doc
> may not say that, or you may not agree that it should say that. 
> 
> 
>>4/ Overlap with the SPARQL-the-query-language
>>
>>As a QL, SPARQL can be used locally so it needs LIMIT and DISTINCT in
>>the language.  It is useful to have 
> 
> 
> Sentence breaks off here...
> 
> I thought we'd treat limit and distinct in the same way as FROM. In
> the local case, they are what counts. In the protocol case, they are
> hints to the protocol layer. Hence I put in headers to convey that
> info in the HTTP protocol.
> 
> 
>>5/ Streaming
>>
>>The XML result format is (going to be) designed for streaming.  RDF
>>serializations could be used as a stream but in general aren't.  Why
>>does the protocol need to ask for streaming?  
> 
> 
> We have a design objective or requirement about streaming. I wasn't
> sure we had that covered already, so I put in a bit about streaming
> and a header in the HTTP. I'm cool w/ removing these if the XML format
> alone covers our req/DO -- but for other protocols, like Jabber, that
> are more streaming-friendly, it might make sense to leave streaming
> in the abstract.
> 
> I care *most*, but not *only* about HTTP. Jabber is a good example of
> a non-HTTP protocol I care about, and it's a lot more stream-centric
> than HTTP, IIRC.
> 
> 
>>Could it not be that XML
>>results are streamed always (its a consequence of the format anyway as
>>far as I can see).
> 
> 
> Yes, that's probably the thing to do, I just wasn't sure. I will
> remove the sparql-stream bit from the HTTP.
> 
> 
>>6/ Use of HTTP OPTION 
>>
>>I used this is Joseki and it has been (politely) pointed out that this
>>is an abuse of the verb.  I don't know whetehr it is or isn't.  As
>>OPTIONs is deployed, there is danger in overloading its use.
> 
> 
> Uh, that overloading danger applies to *all* the verbs, not just
> OPTIONS, IMO. I.e., nothing special about OPTIONS.
> 
>>From RFC 2616,
> 
>    This method [OPTIONS] **allows the client to determine the options
>    and/or requirements associated with a resource, or the capabilities
>    of a server, without implying a resource action or initiating a
>    resource retrieval.** (my emphasis)
> 
> That's almost *exactly* how I describe getOptions in the abstract and
> OPTIONS in the HTTP protocols. Seems perfectly congruent to me.
> 
> 
>>Comments on the overall approach:
>>
>>A/ It's not clear as to why this is the right set of operations - indeed
>>I'm not sure it is. There are alternative ways of approaching update,
>>such as graph operators. 
> 
> 
> <stupid-question>What are graph operators?</>
> 
>>The handling of bNodes needs to be consider.
> 
> 
> Yes, I've so far punted all of that to you and Peter Patel Scheider! :>
> 
>>There needs to be at least need a "QueryDelete" operation which
>>identifies a part of a graph and removes it - otherwise its not possible
>>to get bNodes out. 
> 
> 
> That's exactly what deleteTriples does, Andy. It takes a SPARQL query
> and removes the told triples identified by the results of the
> query. Or do you mean something different?
> 
> 
>>B/ Update operations need to consider concurrent update -  there may
>>need for either operation packing (more than one operation is an atomic
>>request) or use of timestamps or locking.  
> 
> 
> Yes, I got held up for a week trying to figure out how to send more
> than 1 request in HTTP w/out going to a POST or complex multipart/mime
> scheme. I gave up and decided that stuff was definitely 2.0.
> 
> But can't resource contention be totally opaque to the client? We can
> do asynch callbacks in HTTP easily enough. Or just a kind of polling.
> 
> But generally I don't think that belongs in the spec; it seems totally
> implementation-specific. Maybe I'm wrong?
> 
> 
>>Example:
>>
>>Change a FOAF record from old data to new data such that other client's
>>don't see half changed data to old data as well as new data.
> 
> 
> Sure. While updating, server locks that graph and doesn't answer
> other operations against it; or it answers them with a URI at which
> the operation results will eventually be avaiable, which leaves
> clients free to poll that URI. HEADs are cheap.
> 
> 
>>C/ getGraph
>>
>>This seems to be overlapping with HTTP GET. An Apache server is a fine
>>RDF server!  I can see that in a named collection of graphs, we might
>>want to extract one of them but it is possible with 
>>
>>CONSTRUCT * WHERE SOURCE <uri> ($x $y $z) 
>>
>>subject to other discussions.
>>
>>I guess I don't see the need to bring out this in a special operation
>>when we have plain HTTP GET and languages to do it.
> 
> 
> I was following Joseki here! :>
> 
> But, seriously, what about non-HTTP concrete protocols?
> 
> I really intend to use SPARQL over things other than HTTP. SMTP,
> Jabber, and BEEP are likely candidates. All that stuff about
> HTTP/Apache/GET are totally irrelevant in those cases, yeah?
> 
> Retrieve a graph seems the most primitive and useful protocol
> operation of all. :>
> 
> Thanks for comments. I'll make some doc changes based on this, and I'm
> happy to keep talking about it. I haven't reached any totally
> unrevisable conclusions yet.
> 
> Kendall
Received on Friday, 12 November 2004 12:03:31 UTC