- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Thu, 18 Apr 2013 12:07:39 -0400
- To: Jerven Bolleman <jerven.bolleman@isb-sib.ch>
- Cc: "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <CAFKQJ8=3yC_Hc5RqA7c5FnNpYaGK3RrXy=RpH_SuGZsUQNdozA@mail.gmail.com>
On Thu, Apr 18, 2013 at 11:55 AM, Jerven Bolleman < jerven.bolleman@isb-sib.ch> wrote: > Hi Alan, > On Apr 18, 2013, at 5:33 PM, Alan Ruttenberg wrote: > > > > > On Thu, Apr 18, 2013 at 7:53 AM, Jerven Bolleman < > jerven.bolleman@isb-sib.ch> wrote: > > >Last but not least how can we avoid that users need to run SELECT > (COUNT(DISTINT(?s) as ?sc} WHERE {?s ?p ?o} and friends. > > It's always rather disappointing to me that basic queries like this > aren't very fast. I remember that we had a stored procedure for listing the > predicates used in the store. It ran in a fraction of a second, while the > straightforward query took ages. > Its a good point and currently they do run to slow. The problem is the > DISTINCT, these are hard to optimize away, even in current > RDMBS these take time. Everyone has been busy on making SPARQL 1.1 work > that optimizations have taken a step back for a while. > > I am interested in why queries like this are not optimized. Seems to me > that this should be straightforward to optimize by looking at index > structures. > Depends very much on your index structures. And even then you have to > traverse your entire index. > So lets say that for UniProt this query can be fully answered by scanning > only a SPOC index. That index is 40GB large. > A single HD drives data through at 200MB/s so that will still take 200 > seconds at best.[1] > Not if the index is updated with a count field whenever there are inserts. This would be a matter for virtuoso to implement. > > Currently many implementations do not detect that this can be answered by > only doing an index count and does not required materialization > of the triple patterns. So you end up putting every ?s into a set after > which the count operation is done. This is horrifyingly expensive for a dataset like UniProt with billions of ?subjects. Even old fashioned > unix sort -u takes ages here. > I know. As I indicate the responsibility for implementation of this lies with the triple store vendors. Kingsley? > > Regards, > Jerven > > Rather than struggling to have users avoid basic, useful queries, how > about making them work well. > > > > As use evolves, people reach a level where they do need to be cognizant > of how queries are run. At that point, there's not a simple way to say > which queries to avoid. > > > > The most useful tools to have are those that expose query plans as > clearly as possible, highlight which parts of them are taking lots of time, > and have a reference page that helps people configure their database, or > reformulate queries to address the execution problems that arise. A first > step towards this, if you are using virtuoso, is to always ask for the > query cost and display it with a link to ask for the query plan. With a > little more work you can speculatively run the query for a bit and if it > times out, with the error message display (or provide in the error message) > the query plan as discussed above. If you want to give your users a little > more control and think they will take advantage of it, you could add some > way for them to say their guess of whether the query is easy, moderate, or > hard, and allocate time to the query appropriately (e.g have > buttons/services easy, moderate, or hard in place of a single execute query > button). > > > > Here's a couple of pages we had compiled about performance. I expect > they are out of date as we haven't tended to them in a few years, but > perhaps they will be of use to someone. > > > > http://neurocommons.org/page/Virtuoso_performance > > > [1] Please check my maths its been a long day. > > ------------------------------------------------------------------- > Jerven Bolleman Jerven.Bolleman@isb-sib.ch > SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85 > CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58 > 1211 Geneve 4, > Switzerland www.isb-sib.ch - www.uniprot.org > Follow us at https://twitter.com/#!/uniprot > ------------------------------------------------------------------- > >
Received on Thursday, 18 April 2013 16:08:37 UTC