Jena database performance from Dave Reynolds on 2002-07-12 (www-rdf-dspace@w3.org from July 2002)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Fri, 12 Jul 2002 17:55:55 +0100
To: karger@lcs.mit.edu
CC: www-rdf-dspace@w3.org, "Nick Wainwright (E-mail)" <Nick_Wainwright@hplb.hpl.hp.com>
Message-ID: <3D2F0A1B.EA15966D@hplb.hpl.hp.com>

At yesterday's DSpace telecon we discussed the question of whether RDF databases
as they currently exist could support the "several hundred" small queries per
second needed for Haystack implementations.

To give a ball park test of this I set up the following test configuration:
 o A jena test application that creates a tree-shaped set of RDF assertions with
variable depth and branching factor and then does a set of timed repeated random
walks from root to leaf of the tree. Each step on the walk requires a separate
(very small) database query - no query batching. The randomization of repeated
walks hopefully stresses the caching mechanisms sufficiently to make the test
somewhat realistic.
 o I used a branching factor of 10 and depths from 4-6 to test the 10k - 1m
triple range.
 o The application and database were running on the same machine (requests still
go through the TCP stack but not out onto the LAN itself).
 o The main test machine was 700MHz, single CPU, 512Mb, Linux Red hat 7.2.

The average time for one micro-query (one step in the random walk) was:
       Config           #statements  time
 Mysql                      11k       2.8ms
 Mysql                     111k       3.1ms
 Mysql                   1,111k       3.8ms

This is partially CPU bound, preliminary tests on a similarly configured 2GHz
machine were about twice as fast.

Preliminary figures using postgresql are 2-3 slower than this.

If these trivial query patterns are indeed representative of Haystack's
requirements then this suggests that 300-600 accesses per second can be achieved
on sub-$1k PCs (ignoring networking issues).

Loading up 1m statements into a database is another matter however!

Dave

Received on Friday, 12 July 2002 12:56:23 UTC