- From: Yrjänä Rankka <ghard@openlinksw.com>
- Date: Fri, 06 Mar 2009 11:06:33 +0100
- To: Dan Brickley <danbri@danbri.org>
- CC: Georgi Kobilarov <georgi.kobilarov@gmx.de>, Kingsley Idehen <kidehen@openlinksw.com>, Linked Data community <public-lod@w3.org>
Dan Brickley wrote: > On 6/3/09 10:21, Yrjänä Rankka wrote: >> Georgi Kobilarov wrote: >>> Hi Kingsley, >>> >>> DESCRIBE <http://dbpedia.org/resource/London> takes 3 minutes to >>> execute on lod.openlinksw.com ... >>> >> It took only a few seconds when I tried it. Takes time to warm up a pan >> of this size, as is the case with any DBMS. As the working set >> stabilizes in memory, results will come faster. > > What's the granularity of the warmup? If eg /resource/Paris hasn't > been directly viewed, will it benefit much from general warmup of > related resources that are mentioned in the queries for that entity? > Very likely so. Also in case of DESCRIBE <http://dbpedia.org/resource/London> the result of ~ 13MB takes a while to transfer as well. Though not quite 3 minutes - at least not through the pipe I'm connected to. Here's the explanation of how the read-ahead works straight from the horse's mouth: In general, looking for resources in a data set improves the working set for that data set. There is some locality based on load order etc. The disk format is 8K pages, 256 pages per extent of 2MB. It is 8 disks and 16 server processes, so disk is too narrow. Disk reads are in general in parallel on all disks. The random access transfer unit is 8K but if you get two reads hitting the same extent within a second of each other, the whole extent is read sequentially instead of the 2^nd single page request. So frequency of access drives bulk prefetching. Then there is cache maintenance policies that differ between just prefetched and actually requested pages. This is a tunable tradeoff between disk throughput and cache pollution. Virtuoso IO is clever enough. But the fact is that running from memory is 1000+ times faster than from disk on a random access workload and RDF is the very essence of random access. > cheers > > Dan > Yrjänä -- Yrjana Rankka | ghard@openlinksw.com Developer, Virtuoso Team | http://www.openlinksw.com | Making Technology Work For You
Received on Friday, 6 March 2009 10:07:21 UTC