- From: Dave Reynolds <der@hplb.hpl.hp.com>
- Date: Fri, 12 Jul 2002 17:55:55 +0100
- To: karger@lcs.mit.edu
- CC: www-rdf-dspace@w3.org, "Nick Wainwright (E-mail)" <Nick_Wainwright@hplb.hpl.hp.com>
At yesterday's DSpace telecon we discussed the question of whether RDF databases as they currently exist could support the "several hundred" small queries per second needed for Haystack implementations. To give a ball park test of this I set up the following test configuration: o A jena test application that creates a tree-shaped set of RDF assertions with variable depth and branching factor and then does a set of timed repeated random walks from root to leaf of the tree. Each step on the walk requires a separate (very small) database query - no query batching. The randomization of repeated walks hopefully stresses the caching mechanisms sufficiently to make the test somewhat realistic. o I used a branching factor of 10 and depths from 4-6 to test the 10k - 1m triple range. o The application and database were running on the same machine (requests still go through the TCP stack but not out onto the LAN itself). o The main test machine was 700MHz, single CPU, 512Mb, Linux Red hat 7.2. The average time for one micro-query (one step in the random walk) was: Config #statements time Mysql 11k 2.8ms Mysql 111k 3.1ms Mysql 1,111k 3.8ms This is partially CPU bound, preliminary tests on a similarly configured 2GHz machine were about twice as fast. Preliminary figures using postgresql are 2-3 slower than this. If these trivial query patterns are indeed representative of Haystack's requirements then this suggests that 300-600 accesses per second can be achieved on sub-$1k PCs (ignoring networking issues). Loading up 1m statements into a database is another matter however! Dave
Received on Friday, 12 July 2002 12:56:23 UTC