Re: SPARQL Protocol for RDF / feedback (fwd) from Seaborne, Andy on 2005-01-26 (public-rdf-dawg@w3.org from January to March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 26 Jan 2005 13:57:49 +0000
To: Dirk-Willem van Gulik <dirkx@webweaving.org>
CC: public-rdf-dawg@w3.org
Message-ID: <41F7A1DD.1020900@hp.com>

Dirk,

Thanks for the comments - I found them helpful and a bit scary where it talks
about the issues around long URLs and security tools.

Dirk-Willem van Gulik wrote:
> 
> Some feedback
> 
> URI layout
> 
> a	GET http://foo.com//qps?query-lang=http...&graph-id=http..my.example%2F3.rdf&query=...
> versus
> b	GET http://foo.com//qps/query-lang=http.../graph-id=http..my.example%2F3.rdf/query=...
> versus
> c	POST http://foo.com//qps
> 	with data payload
> 	query-lang=http...&graph-id=http..my.example%2F3.rdf&query=...
> 
> Though A and B are a URI with equal status and semantics - in reality
> caches, loadbalancers, proxies and what not threat A as a function with
> some argument (and implicitly assume some operation (which unclear side
> effects on state)) and B in a more stateless way. And hence will not be
> eager to cache around a - and be sticky by default in the case of a load
> balancer.

> 
> This means that -if- the social contract among us is that a query does
> _NOT_ change state (but see below about GraphCreated) - and that thus the
> results can be cached or can be handled loadbalancer neutral that 'b'
> gives a behaviour closer to that desire.
> 

Joseki does (a) and it exclitly sets the HTTP cache headers.  One usage model I
have is that there is a database being exported as a read-only view via RDF to
teh web and the updates still happen by another application.  Because updates do
not do through the SPARQL service, any caching control is application determined
and, typically, short (seconds, minutes).

Where queries are dynamic the chances of a cache hit must be quite small (unless
the cache is SPARQL aware and decomposes the query) for SELECTIO, CONSTRUCT.
Maybe short DESCRIBE queries are more likely to be usefully cached.

Where queries are repetitive (fixed query, same application, many instances)
then caching could benefiticial.  (Form you own position on the nature of
typical semantic web applications.)

> Secondly a fair number of security product(concerned with things like
> cross site scripting, buffer overuns, viruses, etc) inspect the part after
> the ? different than before.
> 
> Given that the queries can be long-ish - and will not match the easy
> patterns of 'total=51&name=Fred+Blogs' this is somewhat a concern.
> 
> Secondly size limits imposed are different for the path and query element
> - and given the profileraton of viruses, connect: spam - only getting
> shorter over time.
> 
> Finally option 'C' sidesteps a lot of these issues - but is not cacheble
> at all.  It is crystal clear though and has no side effects. However it is
> much more appropriate when things like a GraphCreated' and a
> 'OperationRequestAccepted' are going to be implemented. However with 'C'
> and 'A' the desired semantics of things like PermanentlyMoved and
> TemporarilyMoved would need to be made very clear as a lot of software
> assumes them to apply to the fqdn/path part of the URI and -not- to the ?=
> query part.

I think, in the HTTP binding, we need both POST and GET based queries.

The more I think about it, the more I prefer the service centric paradigm - and 
can even believe that the model-centric one is a restricted case of service == 
graph.

> 
> Dw.
> 

	Andy

Received on Wednesday, 26 January 2005 13:59:30 UTC