Jena database performance

At yesterday's DSpace telecon we discussed the question of whether RDF databases
as they currently exist could support the "several hundred" small queries per
second needed for Haystack implementations.

To give a ball park test of this I set up the following test configuration:
 o A jena test application that creates a tree-shaped set of RDF assertions with
variable depth and branching factor and then does a set of timed repeated random
walks from root to leaf of the tree. Each step on the walk requires a separate
(very small) database query - no query batching. The randomization of repeated
walks hopefully stresses the caching mechanisms sufficiently to make the test
somewhat realistic.
 o I used a branching factor of 10 and depths from 4-6 to test the 10k - 1m
triple range.
 o The application and database were running on the same machine (requests still
go through the TCP stack but not out onto the LAN itself).
 o The main test machine was 700MHz, single CPU, 512Mb, Linux Red hat 7.2.

The average time for one micro-query (one step in the random walk) was:
       Config           #statements  time
 Mysql                      11k       2.8ms
 Mysql                     111k       3.1ms
 Mysql                   1,111k       3.8ms

This is partially CPU bound, preliminary tests on a similarly configured 2GHz
machine were about twice as fast.

Preliminary figures using postgresql are 2-3 slower than this.

If these trivial query patterns are indeed representative of Haystack's
requirements then this suggests that 300-600 accesses per second can be achieved
on sub-$1k PCs (ignoring networking issues).

Loading up 1m statements into a database is another matter however!

Dave

Received on Friday, 12 July 2002 12:56:23 UTC