Re: SHOULD use POST for expensive queries? from Jeen Broekstra on 2006-01-18 (public-rdf-dawg@w3.org from January to March 2006)

From: Jeen Broekstra <jeen@aduna.biz>
Date: Wed, 18 Jan 2006 17:01:47 +0100
To: andy.seaborne@hp.com
CC: Kendall Clark <kendall@monkeyfist.com>, dawg mailing list <public-rdf-dawg@w3.org>
Message-ID: <43CE666B.1070203@aduna.biz>

Seaborne, Andy wrote:

> Kendall Clark wrote:
>> Folks,
>>
>> Mark Baker suggests [1] that we should add a SHOULD requirement that  
>> queryHttpPost binding should be used "where the cost of processing  
>> the query may be prohibitive". I don't really agree with this, since  
>> there's no way to no statically which are the expensive and which are  
>> the cheap queries. Even very sophisticated query analysis can't tell  
>> you which RDF datasets are expensive to assemble.
> 
> Very true.  It's not just the query that determines whether it will be 
> expensive - it's the dataset as well (and the sever load).
 >
> [I confess I don't see why POST is better than GET for expensive 
> operations except that timeouts are not at the mercy of caches as well.]

If I understand correctly, the main argument is that it guards against 
potentially expensive "flippant" requests (e.g. by automated clients 
such as by robots and spiders) because such agents only use GET requests 
(and since such agents are typically not aware of what to put in a POST 
request). So you guard your service against being bombarded with 
expensive requests by such spiders and robots by 'hiding' these requests 
and only exposing them through POST.

I don't consider that a very compelling reason: it sounds more like a 
hack than a solution that an official protocol should recommend.

Furthermore it is doubtful IMHO that any (automated) client that does 
not know SPARQL will generate very expensive requests, and the ones that 
do know SPARQL will not be blocked because they will also know how to do 
POST requests.

>> And, further, I don't know of any way to programmatically redirect  
>> expensive GETs to POSTs (you can send a Location: header to the POST  
>> endpoint, if it's different from the GET endpoint, but I don't think  
>> that *really* suffices; alternately, we could define a WSDL fault,  
>> UsePost, but that seems an awful kludge), and I don't really see the  
>> *point* of doing so either, since if the query is too expensive, it's  
>> too expensive, whether it comes in via GET or POST.
>>
>> Mark retorts [2] that the "safety" of GET includes expensive  
>> operations, citing some message from Roy Fielding, but I think the  
>> message undercuts Mark's use of it, since it's very clearly about  
>> implementations of services, not about the semantics of their  
>> interfaces.
>>
>> Pat +1'd the proposal, but that was before further discussion, so I'm  
>> not certain where he would be now. I'm opposed to the inclusion that  
>> Baker suggests, for the reasons I've stated, but I will leave it to  
>> the WG to decide.
> 
> SHOULD language (meaning "carefully weigh the situation before choosing 
> a different course") is acceptable if that reflects good web practice; 
> not having the text on the grounds that you believe that there isn't 
> anything sufficiently SPARQL related is also acceptable.

FWIW I don't consider using POST as some sort of 'back door' for 
potentially vulnerable-to-spiders-and-DOS-attacks resources good Web 
practice. (I actually think that using POST for queries _at all_ is not 
very elegant, but acceptable only because there is no other way to send 
long query strings to the server). So I'd be in favor of not putting 
this in the spec.

By the way: if an implementation would still choose this approach (that 
is, blocking 'expensive' requests on GET but allowing them on POST), 
there is nothing in the current spec that really prohibits that, is 
there? We say 'should' not 'must'.

Jeen

Received on Wednesday, 18 January 2006 16:09:17 UTC