- From: David R. Karger <karger@theory.lcs.mit.edu>
- Date: Fri, 2 Aug 2002 18:20:59 -0400
- To: der@hplb.hpl.hp.com
- CC: www-rdf-dspace@w3.org, Nick_Wainwright@hplb.hpl.hp.com
Dennis sat down and profiled, and found that rendering the Ozone homepage required 45,000 queries to the RDF store. Cholesterol handles this in a few seconds; sleepycat is about 60 times slower. As I mentioned in my last email, this doesn't mean cholesterol is the answer to our problems: as an in-memory system, I do not think it will scale beyond the tiny corpora we are now working with. I can imagine the kind of system we need---basically, something like cholesterol acting as an in-memory cache for something like sleepycat as the persistent store---but don't think we have the manpower to build it. Any thoughts? d Date: Fri, 12 Jul 2002 17:55:55 +0100 From: Dave Reynolds <der@hplb.hpl.hp.com> X-Accept-Language: en CC: www-rdf-dspace@w3.org, "Nick Wainwright (E-mail)" <Nick_Wainwright@hplb.hpl.hp.com> X-MailScanner: Found to be clean X-SpamBouncer: 1.5 (6/13/02) X-SBPass: No Freemail Filtering X-SBClass: OK X-Folder: Default At yesterday's DSpace telecon we discussed the question of whether RDF databases as they currently exist could support the "several hundred" small queries per second needed for Haystack implementations. To give a ball park test of this I set up the following test configuration: o A jena test application that creates a tree-shaped set of RDF assertions with variable depth and branching factor and then does a set of timed repeated random walks from root to leaf of the tree. Each step on the walk requires a separate (very small) database query - no query batching. The randomization of repeated walks hopefully stresses the caching mechanisms sufficiently to make the test somewhat realistic. o I used a branching factor of 10 and depths from 4-6 to test the 10k - 1m triple range. o The application and database were running on the same machine (requests still go through the TCP stack but not out onto the LAN itself). o The main test machine was 700MHz, single CPU, 512Mb, Linux Red hat 7.2. The average time for one micro-query (one step in the random walk) was: Config #statements time Mysql 11k 2.8ms Mysql 111k 3.1ms Mysql 1,111k 3.8ms This is partially CPU bound, preliminary tests on a similarly configured 2GHz machine were about twice as fast. Preliminary figures using postgresql are 2-3 slower than this. If these trivial query patterns are indeed representative of Haystack's requirements then this suggests that 300-600 accesses per second can be achieved on sub-$1k PCs (ignoring networking issues). Loading up 1m statements into a database is another matter however! Dave
Received on Friday, 2 August 2002 18:22:37 UTC