Re: resources for network-based/hierarchical RDF store from Steve Harris on 2007-05-01 (semantic-web@w3.org from May 2007)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Tue, 1 May 2007 08:01:17 +0100
To: Bradley Allen <ballen@siderean.com>
Cc: Danny Ayers <danny.ayers@gmail.com>, <semantic-web@w3.org>
Message-Id: <4ACCE129-0B1F-4AAF-98E3-C8D2C1AE586C@ecs.soton.ac.uk>

On 1 May 2007, at 01:47, Bradley Allen wrote:

> Steve- To clarify, by "hosted Web service," I was addressing the  
> application
> you have built, and didn't mean to suggest that store access was  
> available
> as a service directly.

I see. I thought you meant something along the lines of the Amazon  
storage services.

> As you say, it's a bit apples and oranges to draw direct  
> comparisons; we're
> consuming a lot of RAM for the kind of indexing necessary to  
> support fast
> aggregate operators, and when you add up the SPARQL queries with  
> the queries
> using aggregate operators for the facet counts (which don't yet  
> have SPARQL
> equivalents) we're looking at roughly the same number of queries,  
> on the
> order of several hundred per rendered navigation page, which is  
> returned to
> the browser in less than a second. This time includes the overhead of
> inference to support transitive closure of hierarchical facets and
> user-role-based filtering of results and result metadata for  
> security and
> entitlement purposes. - BPA

Yes, the lack of aggregate operators is one of the reasons were  
getting through so many SPARQL queries. I'm working on a SPARQL- 
inspired language that has support for aggregates at the moment.

We're also doing entitlement based filtering (essential given the  
problem domain), but just using the GRAPH operator, so I suspect it's  
coarser than yours.

- Steve


> On 4/30/07 3:08 PM, "Steve Harris" <S.W.Harris@ecs.soton.ac.uk> wrote:
>
>> On 30 Apr 2007, at 20:34, Bradley Allen wrote:
>>>
>>> The benchmark we did with Elsevier was performed on a
>>> hierarchically-clustered grid of 32 commodity Linux boxes, each
>>> running an
>>> instance of Seamark Navigator. The RDF represented the  
>>> bibliographical
>>> information describing 40 million articles plus 10 million
>>> descriptions of
>>> authors. The application was an end-user relational navigation
>>> interface
>>> over the collection of articles and authors.
>> ...
>>> A secondary difference, in contrast to the Garlik store, is that
>>> this is a
>>> commercially-supported software product as opposed to a hosted Web
>>> service,
>>> although we do provide hosting for applications like the the Oracle
>>> Technology Network Semantic Web (http://otnsemanticweb.oracle.com).
>>
>> For comparison the Garlik store (JXT) in our production system stores
>> just over 2 gigaquads on 8 commodity Linux boxes. Typical query
>> response time for our application is 2-3ms per query - using the
>> SPARQL language, but not the protocol. However, one chunk of RDF is
>> not like another, so I don't want to draw direct comparisons. The
>> queries are fairly unexciting SPARQL queries, 8-9 triple patterns
>> with 2 or 3 OPTIONAL clauses, some have simple FILTER expressions.
>> Each report generated for a user runs a few hundred SPARQL queries of
>> that type, and it happens in around a second.
>>
>> It has ACID transactions and N-way failover redundancy to support the
>> high uptime needed to run a sizeable business off an RDF store.
>>
>> I'm not sure what you mean by "hosted Web service", but the Garlik
>> store is currently only for internal use. It's supports our
>> commercial data management service, but access to the store is not
>> available directly to customers.
>>
>> - Steve
>
>

Received on Tuesday, 1 May 2007 07:01:24 UTC