Re: Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

Hi,

On Fri, Apr 19, 2013 at 8:49 AM, Jerven Bolleman
<jerven.bolleman@isb-sib.ch> wrote:
> -------- Original Message --------
> Subject: Re: Public SPARQL endpoints:managing (mis)-use and communicating
> limits to users.
> Date: Thu, 18 Apr 2013 23:21:46 +0200
> From: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
> To: Rob Warren <warren@muninn-project.org>
>
> Hi Rob,
>
> There is a fundamental problem with HTTP status codes.
> Lets say a user submits a complex but small sparql request.
>
> My server sees the syntax is good and starts to reply in good faith.
> This means the server starts the http response and sends an 200 OK
> Some results are being send....
> However, during the evaluation the server gets an exception.
> What to do? I can't change the status code anymore...
>
> Waiting until server know the query can be answered is not feasible because
> that would mean
> the server can't start giving replies as soon as possible. Which likely
> leads
> to connection timeouts. Using HTTP status codes when responses are likely to
> be larger
> than 1 MB works badly in practice.

That's not really true. I can download multi-gigabyte files over HTTP
without any problem. The issue is more with servers sending a 200 OK
response, when they can't actually guarantee that they can fulfil the
request.

While there are always going to be things like hardware failures that
might mean requests might fail, e.g. leading to truncated or no
responses, but servers shouldn't be sending 200 responses if there are
expected failure conditions. For example timing out a query after a
200 response is sent seems wrong to me.

There are work arounds:

* Response formats, particularly those intended for streaming, could
support markup that indicates that results are terminated, perhaps
with pointers to next page. SPARQL XML & JSON could be extended in
this way, difficult to do with RDF/XML, etc. This would allow server
to terminate streaming but still give a client a valid response with
potentially a link to further results

* Not responding directly at all: serve a 202 Accepted for (expensive)
queries and route the user to another resource from which they can
fetch the query results. Data can be prepared asynchronously and the
response can respond correctly for a timed-out query.

The latter wouldn't necessarily involve changes to SPARQL formats or
the protocol.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: leigh@ldodds.com

Received on Friday, 19 April 2013 08:20:30 UTC