Re: resources for network-based/hierarchical RDF store from Bradley Allen on 2007-05-01 (semantic-web@w3.org from May 2007)

From: Bradley Allen <ballen@siderean.com>
Date: Mon, 30 Apr 2007 17:47:21 -0700
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
CC: Danny Ayers <danny.ayers@gmail.com>, <semantic-web@w3.org>
Message-ID: <C25BDC29.7812%ballen@siderean.com>

Steve- To clarify, by "hosted Web service," I was addressing the application
you have built, and didn't mean to suggest that store access was available
as a service directly.

As you say, it's a bit apples and oranges to draw direct comparisons; we're
consuming a lot of RAM for the kind of indexing necessary to support fast
aggregate operators, and when you add up the SPARQL queries with the queries
using aggregate operators for the facet counts (which don't yet have SPARQL
equivalents) we're looking at roughly the same number of queries, on the
order of several hundred per rendered navigation page, which is returned to
the browser in less than a second. This time includes the overhead of
inference to support transitive closure of hierarchical facets and
user-role-based filtering of results and result metadata for security and
entitlement purposes. - BPA

On 4/30/07 3:08 PM, "Steve Harris" <S.W.Harris@ecs.soton.ac.uk> wrote:

> On 30 Apr 2007, at 20:34, Bradley Allen wrote:
>> 
>> The benchmark we did with Elsevier was performed on a
>> hierarchically-clustered grid of 32 commodity Linux boxes, each
>> running an
>> instance of Seamark Navigator. The RDF represented the bibliographical
>> information describing 40 million articles plus 10 million
>> descriptions of
>> authors. The application was an end-user relational navigation
>> interface
>> over the collection of articles and authors.
> ...
>> A secondary difference, in contrast to the Garlik store, is that
>> this is a
>> commercially-supported software product as opposed to a hosted Web
>> service,
>> although we do provide hosting for applications like the the Oracle
>> Technology Network Semantic Web (http://otnsemanticweb.oracle.com).
> 
> For comparison the Garlik store (JXT) in our production system stores
> just over 2 gigaquads on 8 commodity Linux boxes. Typical query
> response time for our application is 2-3ms per query - using the
> SPARQL language, but not the protocol. However, one chunk of RDF is
> not like another, so I don't want to draw direct comparisons. The
> queries are fairly unexciting SPARQL queries, 8-9 triple patterns
> with 2 or 3 OPTIONAL clauses, some have simple FILTER expressions.
> Each report generated for a user runs a few hundred SPARQL queries of
> that type, and it happens in around a second.
> 
> It has ACID transactions and N-way failover redundancy to support the
> high uptime needed to run a sizeable business off an RDF store.
> 
> I'm not sure what you mean by "hosted Web service", but the Garlik
> store is currently only for internal use. It's supports our
> commercial data management service, but access to the store is not
> available directly to customers.
> 
> - Steve

Received on Tuesday, 1 May 2007 00:47:28 UTC