Virtuoso Optimizations for the Berlin SPARQL Benchmark

All --

We had a look at Chris Bizer's initial results with the BSBM on
Virtuoso.  The first results were rather bad, as nearly all of the
run time was spent optimizing the SPARQL statements and under 10%
actually running them.

So I spent a couple of days on the SPARQL/SQL compiler, to the
effect of making it do a better guess of initial execution plan and
streamlining some operations.  In fact, many of the queries in BSBM
are not particularly sensitive to execution plan, as they access
a verysmall portion of the database.  So to close the matter, I put
in aflag that makes the SQL compiler give up on devising new plans
if the time of the best plan so far is less than the time spent
compiling so far.

With these changes, available now as a diff on top of 5.0.7, we run
quite well, several times better than initially.  With the compiler
time cutoff in place (ini parameter StopCompilerWhenXOverRunTime = 1),
we get the following times, output from the BSBM test driver:

   Starting test...

   0: 1031.22 ms, total: 1151 ms
   1:  982.89 ms, total: 1040 ms
   2:  923.27 ms, total:  968 ms
   3:  898.37 ms, total:  932 ms
   4:  855.70 ms, total:  865 ms

   Scale factor:               10000
   Number of query mix runs:   5 times
   min/max Query mix runtime:  0.8557 s / 1.0312 s
   Total runtime:              4.691 seconds
   QMpH:                       3836.77 query mixes per hour
   CQET:                       0.93829 seconds average runtime
                                       of query mix
   CQET (geom.):               0.93625 seconds geometric mean
                                       runtime of query mix

   Metrics for Query 1:
      Count:                 5 times executed in whole run
      AQET:                  0.012212 seconds (arithmetic mean)
      AQET(geom.):           0.009934 seconds (geometric mean)
      QPS:                   81.89 Queries per second
      minQET/maxQET:         0.00684000s / 0.03115700s
      Average result count:  7.0
      min/max result count:  3 / 10

   Metrics for Query 2:
      Count:                 35 times executed in whole run
      AQET:                  0.030490 seconds (arithmetic mean)
      AQET(geom.):           0.029776 seconds (geometric mean)
      QPS:                   32.80 Queries per second
      minQET/maxQET:         0.02467300s / 0.06753000s
      Average result count:  22.5
      min/max result count:  15 / 30

   Metrics for Query 3:
      Count:                 5 times executed in whole run
      AQET:                  0.006947 seconds (arithmetic mean)
      AQET(geom.):           0.006905 seconds (geometric mean)
      QPS:                   143.95 Queries per second
      minQET/maxQET:         0.00580000s / 0.00795100s
      Average result count:  4.0
      min/max result count:  0 / 10

   Metrics for Query 4:
      Count:                 5 times executed in whole run
      AQET:                  0.008858 seconds (arithmetic mean)
      AQET(geom.):           0.008829 seconds (geometric mean)
      QPS:                   112.89 Queries per second
      minQET/maxQET:         0.00804400s / 0.01019500s
      Average result count:  3.4
      min/max result count:  0 / 10

   Metrics for Query 5:
      Count:                 5 times executed in whole run
      AQET:                  0.087542 seconds (arithmetic mean)
      AQET(geom.):           0.087327 seconds (geometric mean)
      QPS:                   11.42 Queries per second
      minQET/maxQET:         0.08165600s / 0.09889200s
      Average result count:  5.0
      min/max result count:  5 / 5

   Metrics for Query 6:
      Count:                 5 times executed in whole run
      AQET:                  0.131222 seconds (arithmetic mean)
      AQET(geom.):           0.131216 seconds (geometric mean)
      QPS:                   7.62 Queries per second
      minQET/maxQET:         0.12924200s / 0.13298200s
      Average result count:  3.6
      min/max result count:  3 / 5

   Metrics for Query 7:
      Count:                 20 times executed in whole run
      AQET:                  0.043601 seconds (arithmetic mean)
      AQET(geom.):           0.040890 seconds (geometric mean)
      QPS:                   22.94 Queries per second
      minQET/maxQET:         0.01984400s / 0.06012600s
      Average result count:  26.4
      min/max result count:  5 / 96

   Metrics for Query 8:
      Count:                 10 times executed in whole run
      AQET:                  0.018168 seconds (arithmetic mean)
      AQET(geom.):           0.016205 seconds (geometric mean)
      QPS:                   55.04 Queries per second
      minQET/maxQET:         0.01097600s / 0.05066900s
      Average result count:  12.8
      min/max result count:  6 / 20

   Metrics for Query 9:
      Count:                 20 times executed in whole run
      AQET:                  0.043813 seconds (arithmetic mean)
      AQET(geom.):           0.043807 seconds (geometric mean)
      QPS:                   22.82 Queries per second
      minQET/maxQET:         0.04274900s / 0.04504100s
      Average result count:  0.0
      min/max result count:  0 / 0

   Metrics for Query 10:
      Count:                 15 times executed in whole run
      AQET:                  0.030697 seconds (arithmetic mean)
      AQET(geom.):           0.029651 seconds (geometric mean)
      QPS:                   32.58 Queries per second
      minQET/maxQET:         0.02072000s / 0.03975700s
      Average result count:  1.1
      min/max result count:  0 / 4

   real  0 m 5.485 s
   user  0 m 2.233 s
   sys   0 m 0.170 s


Of the approximately 5.5 seconds of running five query mixes, the test
driver spends 2.2.  The server side processing time is 3.1 seconds, of
which SQL compilation is 1.35s.  The rest is miscellaneous system time.
The measurement is on 64-bit Linux, 2 GHz Xeon 5130.

We note that this type of workload would be done with stored procedures
or prepared, parameterized queries in the SQL world.

There will be some further tuning still but this addresses the bulk
of the matter.  There will be a separate message about the patch
containing these improvements.

Orri

Received on Wednesday, 30 July 2008 18:06:10 UTC