- From: Andrea Splendiani <andrea.splendiani@iscb.org>
- Date: Thu, 18 Apr 2013 14:23:49 +0100
- To: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
- Cc: public-lod@w3.org
Hi, I think that some caching with a minimum of query rewriting would get read of 90% of the select{?s ?p ?o} where {?s?p ?o} queries. From a user perspective, I would rather have a clear result code upfront telling me: your query is to heavy, not enough resources and so on, than partial results + extra codes. I won't do much of partial results anyway... so it's time wasted both sides. One empiric solution could be to assign a quota per requesting IP (or other form of identification). Then one could restrict the total amount of resource per time-frame, possibly with smart policies. It would also avoid people breaking big queries in many small ones... But I was wondering: why is resource consumption a problem for sparql endpoint providers, and not for other "providers" on the web ? (say, YouTube, Google, ...). Is it the unpredictability of the resources needed ? best, Andrea Il giorno 18/apr/2013, alle ore 12:53, Jerven Bolleman <jerven.bolleman@isb-sib.ch> ha scritto: > Hi All, > > Managing a public SPARQL endpoint has some difficulties in comparison to managing a simpler REST api. > Instead of counting api calls or external bandwidth use we need to look at internal IO and CPU usage as well. > > Many of the current public SPARQL endpoints limit all their users to queries of limited CPU time. > But this is not enough to really manage (mis) use of an endpoint. Also the SPARQL api being http based > suffers from the problem that we first send the status code and may only find out later that we can't > answer the query after all. Leading to a 200 not OK problem :( > > What approaches can we come up with as a community to embedded resource limit exceeded exceptions in the > SPARQL protocols. e.g. we could add an exception element to the sparql xml result format.[1] > > The current limits to CPU use are not enough to really avoid misuse. Which is why I submitted a patch to > Sesame that allows limits on memory use as well. Although limits on disk seeks or other IO counts may be needed by some as well. > > But these are currently hard limits what I really want is > "playground limits" i.e. you can use the swing as much as you want if you are the only child in the park. > Once there are more children you have to share. > > And how do we communicate this to our users. i.e. this result set is incomplete because you exceeded your IO > quota please break up your queries in smaller blocks. > > For my day job where I do manage a 7.4 billion triple store with public access some extra tools in managing users would be > great. > > Last but not least how can we avoid that users need to run SELECT (COUNT(DISTINT(?s) as ?sc} WHERE {?s ?p ?o} and friends. > For beta.sparql.uniprot.org I have been moving much of this information into the sparql endpoint description but its not a place > where people look for this information. > > Regards, > Jerven > > [1] Yeah these ideas are not great timing just after 1.1 but we can always start SPARQL 1.2 ;) > > > > ------------------------------------------------------------------- > Jerven Bolleman Jerven.Bolleman@isb-sib.ch > SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85 > CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 > 1211 Geneve 4, > Switzerland www.isb-sib.ch - www.uniprot.org > Follow us at https://twitter.com/#!/uniprot > ------------------------------------------------------------------- > >
Received on Thursday, 18 April 2013 13:24:43 UTC