Re: resources for network-based/hierarchical RDF store

On 30 Apr 2007, at 23:08, Steve Harris wrote:

>
> On 30 Apr 2007, at 20:34, Bradley Allen wrote:
>>
>> The benchmark we did with Elsevier was performed on a
>> hierarchically-clustered grid of 32 commodity Linux boxes, each  
>> running an
>> instance of Seamark Navigator. The RDF represented the  
>> bibliographical
>> information describing 40 million articles plus 10 million  
>> descriptions of
>> authors. The application was an end-user relational navigation  
>> interface
>> over the collection of articles and authors.
> ...
>> A secondary difference, in contrast to the Garlik store, is that  
>> this is a
>> commercially-supported software product as opposed to a hosted Web  
>> service,
>> although we do provide hosting for applications like the the Oracle
>> Technology Network Semantic Web (http://otnsemanticweb.oracle.com).
>
> For comparison the Garlik store (JXT) in our production system  
> stores just over 2 gigaquads on 8 commodity Linux boxes. Typical  
> query response time for our application is 2-3ms per query - using  
> the SPARQL language, but not the protocol. However, one chunk of  
> RDF is not like another, so I don't want to draw direct  
> comparisons. The queries are fairly unexciting SPARQL queries, 8-9  
> triple patterns with 2 or 3 OPTIONAL clauses, some have simple  
> FILTER expressions. Each report generated for a user runs a few  
> hundred SPARQL queries of that type, and it happens in around a  
> second.

Incase anyone should care, "commodity" in this case means Dell  
SC1430s: single processor, 4GB of RAM, generic 300GB SATA 7200  
drives, gigabit ethernet. We're moving to higher spec machines and  
working on the software to increase the density though.

As the business grows, we're obviously looking to scale the data  
storage in the economically best way. Each user adds a significant  
number of triples.

- Steve

Received on Monday, 30 April 2007 22:39:34 UTC