- From: Bradley Allen <ballen@siderean.com>
- Date: Mon, 30 Apr 2007 12:34:38 -0700
- To: Danny Ayers <danny.ayers@gmail.com>
- CC: <semantic-web@w3.org>
I thought I'd chime in on this discussion in the light of our announcement today about breaking the billion-quad barrier in a pilot with Elsevier (http://www.siderean.com/newsitem.aspx?pid=24) and add some additional gloss to the information in that press release. The benchmark we did with Elsevier was performed on a hierarchically-clustered grid of 32 commodity Linux boxes, each running an instance of Seamark Navigator. The RDF represented the bibliographical information describing 40 million articles plus 10 million descriptions of authors. The application was an end-user relational navigation interface over the collection of articles and authors. The principal difference between this approach and those in some of the other large RDF stores discussed in this thread is the design emphasis on sub-second query response under load for a relational navigation application. In this type of application, a query is effectively equivalent to several tens of SPARQL queries together with aggregate operators that are returning facet value counts for selected attributes of matching resources, along with additional queries to retrieve humanly-readable labels. That being said, the cluster also admits of SPARQL queries against the RDF graph in the manner of most other stores. The RDF quads are automatically partitioned across the cluster on the basis of rdf:type of a given resource and its related resources as necessary to answer relational navigation queries without doing joins across cluster nodes. Updates to the store are handled concurrently and incrementally. The architecture today has the ability to be scaled to provide storage on the order of 10 gigaquads simply by adding more nodes to the cluster. Additional improvements in our development pipeline will add an additional 10x on top of that. A secondary difference, in contrast to the Garlik store, is that this is a commercially-supported software product as opposed to a hosted Web service, although we do provide hosting for applications like the the Oracle Technology Network Semantic Web (http://otnsemanticweb.oracle.com). So, yes, Danny: not only is it doable, it's shipping. ;-) - regards, BPA -- Bradley P. Allen Founder and CTO Siderean Software, Inc. work: +1 310 647 5610 cell: +1 310 951 4300 skype: bpa777 YIM: bpallen777 On 4/27/07 3:58 AM, "Danny Ayers" <danny.ayers@gmail.com> wrote: > > On 26/04/07, Andreas Langegger <andreas.langegger@gmx.at> wrote: > >> We are working on a distributed query processor for SPARQL. Any pointers >> are appreciated. > > You probably have this already: > > DARQ - Federated Queries with SPARQL > http://darq.sourceforge.net/ > > A while ago Steve Harris suggested there might be a chance the *big* > store they've developed for garlik.com could be open sourced. > Listening to this podcast - > > http://talk.talis.com/archives/2007/04/tom_ilube_talks.html > > - apparently it can run on a cluster of generic Linux boxes. So at > least we know it's doable ;-) > > Cheers, > Danny.
Received on Monday, 30 April 2007 19:34:45 UTC