Re: [semanticweb] ANN: DBpedia 3.5 released from Kingsley Idehen on 2010-04-14 (public-lod@w3.org from April 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 14 Apr 2010 09:01:09 -0400
To: Leigh Dodds <leigh.dodds@talis.com>
CC: Ivan Mikhailov <imikhailov@openlinksw.com>, baran <baran@goldmail.de>, semanticweb <semanticweb@yahoogroups.com>, public-lod <public-lod@w3.org>, SW-forum <semantic-web@w3.org>, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>, dbpedia-announcements <dbpedia-announcements@lists.sourceforge.net>, Chris Bizer <chris@bizer.de>
Message-ID: <4BC5BC95.8040005@openlinksw.com>

Leigh Dodds wrote:
> Hi,
>
> 2010/4/14 Ivan Mikhailov <imikhailov@openlinksw.com>:
>   
>> Similarly, growing database size and growing hit rate and growing
>> complexity of queries are not obviously visible from outside, but turn
>> the hosting into a race. We're improving the underlaying RDBMS as fast
>> as we only can just to prevent the service from total halt. One might
>> wish to provide a better service on their own RDBMS and thus to make a
>> good advertisement, but nobody else want to do that _and_ can do that,
>> so we're alone under this load.
>>     
>
> Out of interest, do you actually share any metrics on usage levels,
> common sparql queries, etc?
>
> We have a copy of the dbpedia data loaded into the Talis Platform, but
> its not yet up to date with 3.5. So there's more than one option
> already. Although the service characteristics/features are different
> (different software)
>
> Cheers,
>
> L.
>
>   
Leigh,

When we refer to an "option" we are talking about a mirror rather than 
an alternative place where DBpedia data sets have been loaded.

As for usage levels, the issues have very little to do we sane SPARQL 
query and everything to do with crawlers that actually attempt to 
perform wholesale imports of the entire data set (many attempt this as 
we can seen from the HTTP logs and the payload sizes). In addition, 
remember, we are severing up actual RDF based descriptor resources, and 
these too are crawled wholesale with the intent of populating other data 
spaces (these are also crawled aggressively via LOD and non LOD crawlers).

We are not just providing a SPARQL endpoint, we are also serving RDF 
descriptor resources in a variety of representation formats. And as I've 
stated above, the dominant use pattern is crawling the RDF descriptor 
resources, which (without protection) simply obliterates "across the 
wire bandwidth" as is the case with any document server on a public 
network such as the World Wide Web.

If you want to offer a mirror (i.e. one that mirrors what we are 
offering) then simply let us know, and we can then spell out what that 
entails etc..

-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Wednesday, 14 April 2010 13:01:52 UTC