- From: Dave Reynolds <der@hplb.hpl.hp.com>
- Date: Fri, 12 Jul 2002 17:55:55 +0100
- To: karger@lcs.mit.edu
- CC: www-rdf-dspace@w3.org, "Nick Wainwright (E-mail)" <Nick_Wainwright@hplb.hpl.hp.com>
At yesterday's DSpace telecon we discussed the question of whether RDF databases
as they currently exist could support the "several hundred" small queries per
second needed for Haystack implementations.
To give a ball park test of this I set up the following test configuration:
o A jena test application that creates a tree-shaped set of RDF assertions with
variable depth and branching factor and then does a set of timed repeated random
walks from root to leaf of the tree. Each step on the walk requires a separate
(very small) database query - no query batching. The randomization of repeated
walks hopefully stresses the caching mechanisms sufficiently to make the test
somewhat realistic.
o I used a branching factor of 10 and depths from 4-6 to test the 10k - 1m
triple range.
o The application and database were running on the same machine (requests still
go through the TCP stack but not out onto the LAN itself).
o The main test machine was 700MHz, single CPU, 512Mb, Linux Red hat 7.2.
The average time for one micro-query (one step in the random walk) was:
Config #statements time
Mysql 11k 2.8ms
Mysql 111k 3.1ms
Mysql 1,111k 3.8ms
This is partially CPU bound, preliminary tests on a similarly configured 2GHz
machine were about twice as fast.
Preliminary figures using postgresql are 2-3 slower than this.
If these trivial query patterns are indeed representative of Haystack's
requirements then this suggests that 300-600 accesses per second can be achieved
on sub-$1k PCs (ignoring networking issues).
Loading up 1m statements into a database is another matter however!
Dave
Received on Friday, 12 July 2002 12:56:23 UTC