Re: Fwd: Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users. from Kingsley Idehen on 2013-04-19 (public-lod@w3.org from April 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 19 Apr 2013 06:55:32 -0400
To: public-lod@w3.org
Message-ID: <517122A4.5010103@openlinksw.com>
On 4/19/13 3:49 AM, Jerven Bolleman wrote:
> Forgot reply all
>
>
> -------- Original Message --------
> Subject: Re: Public SPARQL endpoints:managing (mis)-use and 
> communicating limits to users.
> Date: Thu, 18 Apr 2013 23:21:46 +0200
> From: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
> To: Rob Warren <warren@muninn-project.org>
>
> Hi Rob,
>
> There is a fundamental problem with HTTP status codes.
> Lets say a user submits a complex but small sparql request.
>
> My server sees the syntax is good and starts to reply in good faith.
> This means the server starts the http response and sends an 200 OK
> Some results are being send....
> However, during the evaluation the server gets an exception.
> What to do? I can't change the status code anymore...

Depends.

If you have OFFSET and LIMIT in use, you can reflect the new state of 
affairs when the next GET is performed i.e, lets say you have OFFSET 20 
and LIMIT 20, the URL with OFFSET 40 is the request for the next batch 
of results from the solution and the one that would reflect the new 
state of affairs.
>
> Waiting until server know the query can be answered is not feasible 
> because that would mean
> the server can't start giving replies as soon as possible. Which 
> likely leads
> to connection timeouts. 
Not really, not if the configuration granularity is there. Also 
remember, parsing, solution preparation, and actual data retrieval are 
distinct tasks. Also when dealing with aggregates, if you have key 
compression and actual horizontal partitioning of aggregates across a 
cluster, the time to solution shrinks, as we've demonstrated across both 
versions 7 and much more so with with version 7 where column storage 
enables much more compactness of data in general.

> Using HTTP status codes when responses are likely to be larger
> than 1 MB works badly in practice.

See my earlier comments about retrieval being distinct from solution 
preparation. Fetching the data via OFFSET and LIMIT based 
sparql-protocol URLs does enable you address this issue. The same issue 
used to exist with SQL RDBMS based data access via ODBC, JDBC etc.. each 
of those APIs separate query solution preparation from actual data 
retrieval.

When we separate components the right way a lot can be achieved.

Links:

1. http://bit.ly/WteWYI -- Virtuoso 7.0 Column Store
2. http://bit.ly/17oSWk9 -- VLDB 2009 Tutorial on Column Stores
3. http://bit.ly/14ULX2F -- LOD2 benchmark report for BSBM as 50 & 150 
Billion Triples scales (achieved as a result of the use of Column 
Storage, Key Compressed, and Vectored query execution).


Kingsley
>
>
> Regards,
> Jerven
>
> On Apr 18, 2013, at 10:53 PM, Rob Warren wrote:
>
>> On 18-Apr-13, at 8:53 AM, Jerven Bolleman wrote:
>>>
>>> Many of the current public SPARQL endpoints limit all their users to 
>>> queries of limited CPU time.
>>> But this is not enough to really manage (mis) use of an endpoint. 
>>> Also the SPARQL api being http based
>>> suffers from the problem that we first send the status code and may 
>>> only find out later that we can't
>>> answer the query after all. Leading to a 200 not OK problem :(
>>
>> Jerven,
>>
>> I agree that a 200 reply to  'query too complex', 'query too big' or 
>> 'query timeout' is not acceptable. However, limits on queries are a 
>> tool to keep dumb clients from pounding on the server too hard.
>>
>> A standardized reply / error would be something that I would like to 
>> see in that it allows the client to modify its approach to querying 
>> the server. It would also be an opportunity to have the server signal 
>> to the client what trade-off it is willing to make between sending 
>> more triples and increasing the query complexity.
>>
>> Could '413 Request Entity Too Large', '429 Too Many Requests' and 
>> '453 Not Enough Bandwidth' be abused here for Sparql endpoints?
>>
>> rhw
>>
>
> -------------------------------------------------------------------
> Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
> 1211 Geneve 4,
> Switzerland     www.isb-sib.ch - www.uniprot.org
> Follow us at https://twitter.com/#!/uniprot
> -------------------------------------------------------------------
>
>
>
>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Friday, 19 April 2013 10:55:55 UTC