Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users. from Kingsley Idehen on 2013-04-18 (public-lod@w3.org from April 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 18 Apr 2013 08:46:54 -0400
To: public-lod@w3.org
Message-ID: <516FEB3E.4030507@openlinksw.com>

On 4/18/13 7:53 AM, Jerven Bolleman wrote:
> Hi All,
>
> Managing a public SPARQL endpoint has some difficulties in comparison to managing a simpler REST api.
> Instead of counting api calls or external bandwidth use we need to look at internal IO and CPU usage as well.
>
> Many of the current public SPARQL endpoints limit all their users to queries of limited CPU time.
> But this is not enough to really manage (mis) use of an endpoint. Also the SPARQL api being http based
> suffers from the problem that we first send the status code and may only find out later that we can't
> answer the query after all. Leading to a 200 not OK problem :(
>
> What approaches can we come up with as a community to embedded resource limit exceeded exceptions in the
> SPARQL protocols. e.g. we could add an exception element to the sparql xml result format.[1]

Good idea, for sure.

>
> The current limits to CPU use are not enough to really avoid misuse. Which is why I submitted a patch to
> Sesame that allows limits on memory use as well. Although limits on disk seeks or other IO counts may be needed by some as well.
>
> But these are currently hard limits what I really want is
> "playground limits" i.e. you can use the swing as much as you want if you are the only child in the park.
> Once there are more children you have to share.

That level of granularity isn't really in scope per se. re. HTTP or 
HTTP+SPARQL (aka SPARQL-Protocol).
>
> And how do we communicate this to our users. i.e. this result set is incomplete because you exceeded your IO
> quota please break up your queries in smaller blocks.

A good amount of these error conditions could fit into existing HTTP 
responses. Worst case, HTTP+SPARQL could be enhanced to provide 
additional granularity etc..

>
> For my day job where I do manage a 7.4 billion triple store with public access some extra tools in managing users would be
> great.
>
> Last but not least how can we avoid that users need to run SELECT (COUNT(DISTINT(?s) as ?sc} WHERE {?s ?p ?o} and friends.
> For beta.sparql.uniprot.org I have been moving much of this information into the sparql endpoint description but its not a place
> where people look for this information.

We should encourage them to look there :-)

Kingsley
>
> Regards,
> Jerven
>
> [1] Yeah these ideas are not great timing just after 1.1 but we can always start SPARQL 1.2 ;)
>
>
>
> -------------------------------------------------------------------
> Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
> 1211 Geneve 4,
> Switzerland     www.isb-sib.ch - www.uniprot.org
> Follow us at https://twitter.com/#!/uniprot
> -------------------------------------------------------------------
>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature

Received on Thursday, 18 April 2013 12:47:20 UTC