Re: SPARQL Protocol for RDF / feedback (fwd) from Dirk-Willem van Gulik on 2005-01-26 (public-rdf-dawg@w3.org from January to March 2005)

From: Dirk-Willem van Gulik <dirkx@webweaving.org>
Date: Wed, 26 Jan 2005 07:30:16 -0800 (PST)
To: "Seaborne, Andy" <andy.seaborne@hp.com>
cc: public-rdf-dawg@w3.org
Message-ID: <20050126072043.T96832@skutsje.san.webweaving.org>
On Wed, 26 Jan 2005, Seaborne, Andy wrote:

> > Some feedback
> >
> > URI layout
> >
> > a	GET http://foo.com//qps?query-lang=http...&graph-id=http..my.example%2F3.rdf&query=...
> > versus
> > b	GET http://foo.com//qps/query-lang=http.../graph-id=http..my.example%2F3.rdf/query=...
> > versus
> > c	POST http://foo.com//qps
> > 	with data payload
> > 	query-lang=http...&graph-id=http..my.example%2F3.rdf&query=...
> >
> > Though A and B are a URI with equal status and semantics - in reality
> > caches, loadbalancers, proxies and what not threat A as a function with
> > some argument (and implicitly assume some operation (which unclear side
> > effects on state)) and B in a more stateless way. And hence will not be
> > eager to cache around a - and be sticky by default in the case of a load
> > balancer.
>
> > This means that -if- the social contract among us is that a query does
> > _NOT_ change state (but see below about GraphCreated) - and that thus the
> > results can be cached or can be handled loadbalancer neutral that 'b'
> > gives a behaviour closer to that desire.
>
> Joseki does (a) and it exclitly sets the HTTP cache headers.  One usage model I
> have is that there is a database being exported as a read-only view via RDF to
> teh web and the updates still happen by another application.  Because updates do
> not do through the SPARQL service, any caching control is application determined
> and, typically, short (seconds, minutes).

If the social contract is indeed a 'red only view' then IMHO 'b' captures
that better from a practical and deployed perspective. Though 'a' is
technically absolutely right (and yuu could throw in an E-Tag if you
wanted to) in actual practice it is fairly common to go "if index('?'..) {
donotcache=1; };".

> Where queries are dynamic the chances of a cache hit must be quite small (unless
> the cache is SPARQL aware and decomposes the query) for SELECTIO, CONSTRUCT.
> Maybe short DESCRIBE queries are more likely to be usefully cached.

Luckily it is easier to NOT cause things to be cached. Just ad an expired
and pragma or other appropriate hader.

The problem is getting caching at the right moment - esp. when in a
content deliverly network or loadbalancing environment. And not having to
be 'sticky' when not needed.

> Where queries are repetitive (fixed query, same application, many instances)
> then caching could benefiticial.  (Form you own position on the nature of
> typical semantic web applications.)

My personal take: very VERY beneficial (and one of the few area's where
the existing equipment and infrastructure seriously helps and has a 'leg
up' so to speak in competition with ODBC, JDBC, SOAP, etc).

> >
> > Finally option 'C' sidesteps a lot of these issues - but is not cacheble
> > at all.  It is crystal clear though and has no side effects. However it is
> > much more appropriate when things like a GraphCreated' and a
> > 'OperationRequestAccepted' are going to be implemented. However with 'C'
> > and 'A' the desired semantics of things like PermanentlyMoved and
> > TemporarilyMoved would need to be made very clear as a lot of software
> > assumes them to apply to the fqdn/path part of the URI and -not- to the ?=
> > query part.
>
> I think, in the HTTP binding, we need both POST and GET based queries.
>
> The more I think about it, the more I prefer the service centric paradigm - and
> can even believe that the model-centric one is a restricted case of service ==
> graph.

I _personally_ would prefer two modes:

A	none state changing non discriminating mode

		http://foo.com/cgi-bin/get.pl/foo=bar

	where you should set proper Expire, E-Tags and what
	not and -must- expect the result to be cached.

	And where a repeat of the operation will not change
	anything, etc.

B	service mode which uses POST only and where you
	may optionally set things like Pragma's to make
	crystal clear that caching is not the intention and
	that repeating the operation may in fact change state.

With in the spec some clear wording which says that a reply
like HTTP_NOT_ACCEPTABLE(406) or HTTP_NOT_IMPLEMENTED(501)
has some additional semantics - i.e. try the other method.

Dw.,
Received on Wednesday, 26 January 2005 15:34:43 UTC