Re: LOD Cloud Cache Stats

On 4/5/11 7:45 PM, Juan Sequeda wrote:
>
> On Sat, Apr 2, 2011 at 2:55 PM, Kingsley Idehen 
> <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>
>     All,
>
>     I've knocked up a Google spreadsheet that contains stats about our
>     21 Billion Triples+ LOD cloud cache.
>
>     On the issue of Triple Counts, you can't make sense of Data if you
>     can't count it. We can't depend on SPARQL-FED for distributed
>     queries, and we absolutely cannot depend on a Web crawl via
>     follow-your-nose pattern when seeking insights or answers to
>     queries across massive volumes of data.
>
>     The whole BigData game is a huge opportunity for Linked Data and
>     Semantics to finally shine. By shine I mean: show what was
>     erstwhile impossible.
>
>     Exhibit #1 -- how do we Find the proverbial needle in a haystack
>     via ad-hoc queries at Web Scale?
>
>     Exhibit #2 -- how do we leverage faceted exploration and
>     navigation of massive data sets at Web Scale?
>
>     Exhibit #3 -- how do we perform ad-hoc declarative queries (Join
>     and Aggregates variety) that used to be confined to a local
>     Oracle, SQL Server, DB2, Informix, MySQL etc.., at Web Scales esp.
>     if the Web is now a Global Linked Data Space?
>
>     I've issued a challenge to all BigData players to show me a public
>     endpoint that allows me to perform any of the tasks above. Thus
>     far, the silence has been predictably deafening :-)
>
>
> I guess you are the google of the semantic web... assuming that you 
> can stick in your cache hundreds of billion of triples by the end of 
> the year... trillions of triples next year..... etc.
>
> This sounds plausible to me. Google did it :)

Hundreds of billions of triples boils down to the total amount of memory 
we can cobble together across a cluster, bottom line.

Google has airport size data centers. We are at 21 Billion+ with an 
8-node cluster endowed with 48GB RAM per node. Basically, our data 
center setup is less than a rounding number when compared to theirs.

By the end of this year, you'll simply have more triples squeezed into 
the same cluster config. As far as DBMS tech goes our focus boils down 
to maintaining and exceeding current scale while reducing infrastructure 
costs. Note, current LOD cloud cache is also based on Virtuoso 6.x 
cluster engine (row based storage) rather than the 7.x engine (column 
based storage) :-)

Kingsley
>
>
>     Links:
>
>     1.
>     https://spreadsheets.google.com/ccc?key=0AihbIyhlsQSxdHViMFdIYWZxWE85enNkRHJwZXV4cXc&hl=en
>     <https://spreadsheets.google.com/ccc?key=0AihbIyhlsQSxdHViMFdIYWZxWE85enNkRHJwZXV4cXc&hl=en>
>     -- LOD Cloud Cache SPARQL stats queries and results
>
>     -- 
>
>     Regards,
>
>     Kingsley Idehen
>     President&  CEO
>     OpenLink Software
>     Web: http://www.openlinksw.com
>     Weblog: http://www.openlinksw.com/blog/~kidehen
>     <http://www.openlinksw.com/blog/%7Ekidehen>
>     Twitter/Identi.ca: kidehen
>
>
>
>
>
>     -- 
>     You received this message because you are subscribed to the Google
>     Groups "Lotico Semantic Web - Lab" group.
>     To post to this group, send email to lotico-list@googlegroups.com
>     <mailto:lotico-list@googlegroups.com>.
>     To unsubscribe from this group, send email to
>     lotico-list+unsubscribe@googlegroups.com
>     <mailto:lotico-list%2Bunsubscribe@googlegroups.com>.
>     For more options, visit this group at
>     http://groups.google.com/group/lotico-list?hl=en.
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Wednesday, 6 April 2011 00:05:18 UTC