W3C home > Mailing lists > Public > semantic-web@w3.org > May 2007

Re: resources for network-based/hierarchical RDF store

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Tue, 1 May 2007 08:01:17 +0100
Message-Id: <4ACCE129-0B1F-4AAF-98E3-C8D2C1AE586C@ecs.soton.ac.uk>
Cc: Danny Ayers <danny.ayers@gmail.com>, <semantic-web@w3.org>
To: Bradley Allen <ballen@siderean.com>

On 1 May 2007, at 01:47, Bradley Allen wrote:

> Steve- To clarify, by "hosted Web service," I was addressing the  
> application
> you have built, and didn't mean to suggest that store access was  
> available
> as a service directly.

I see. I thought you meant something along the lines of the Amazon  
storage services.

> As you say, it's a bit apples and oranges to draw direct  
> comparisons; we're
> consuming a lot of RAM for the kind of indexing necessary to  
> support fast
> aggregate operators, and when you add up the SPARQL queries with  
> the queries
> using aggregate operators for the facet counts (which don't yet  
> have SPARQL
> equivalents) we're looking at roughly the same number of queries,  
> on the
> order of several hundred per rendered navigation page, which is  
> returned to
> the browser in less than a second. This time includes the overhead of
> inference to support transitive closure of hierarchical facets and
> user-role-based filtering of results and result metadata for  
> security and
> entitlement purposes. - BPA

Yes, the lack of aggregate operators is one of the reasons were  
getting through so many SPARQL queries. I'm working on a SPARQL- 
inspired language that has support for aggregates at the moment.

We're also doing entitlement based filtering (essential given the  
problem domain), but just using the GRAPH operator, so I suspect it's  
coarser than yours.

- Steve

> On 4/30/07 3:08 PM, "Steve Harris" <S.W.Harris@ecs.soton.ac.uk> wrote:
>> On 30 Apr 2007, at 20:34, Bradley Allen wrote:
>>> The benchmark we did with Elsevier was performed on a
>>> hierarchically-clustered grid of 32 commodity Linux boxes, each
>>> running an
>>> instance of Seamark Navigator. The RDF represented the  
>>> bibliographical
>>> information describing 40 million articles plus 10 million
>>> descriptions of
>>> authors. The application was an end-user relational navigation
>>> interface
>>> over the collection of articles and authors.
>> ...
>>> A secondary difference, in contrast to the Garlik store, is that
>>> this is a
>>> commercially-supported software product as opposed to a hosted Web
>>> service,
>>> although we do provide hosting for applications like the the Oracle
>>> Technology Network Semantic Web (http://otnsemanticweb.oracle.com).
>> For comparison the Garlik store (JXT) in our production system stores
>> just over 2 gigaquads on 8 commodity Linux boxes. Typical query
>> response time for our application is 2-3ms per query - using the
>> SPARQL language, but not the protocol. However, one chunk of RDF is
>> not like another, so I don't want to draw direct comparisons. The
>> queries are fairly unexciting SPARQL queries, 8-9 triple patterns
>> with 2 or 3 OPTIONAL clauses, some have simple FILTER expressions.
>> Each report generated for a user runs a few hundred SPARQL queries of
>> that type, and it happens in around a second.
>> It has ACID transactions and N-way failover redundancy to support the
>> high uptime needed to run a sizeable business off an RDF store.
>> I'm not sure what you mean by "hosted Web service", but the Garlik
>> store is currently only for internal use. It's supports our
>> commercial data management service, but access to the store is not
>> available directly to customers.
>> - Steve
Received on Tuesday, 1 May 2007 07:01:24 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:00 UTC