W3C home > Mailing lists > Public > public-lod@w3.org > November 2008

Re: Size matters -- How big is the danged thing

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Thu, 20 Nov 2008 00:57:49 +0000
Message-ID: <210271540811191657m5e63cda2r595c8ea3e05b95b@mail.gmail.com>
To: "Jim Hendler" <hendler@cs.rpi.edu>
Cc: public-lod@w3.org

Hi Jim,

honestly, a count job we launched some time ago gave us a something
less than a billion on Sindice actually (But we currently dont index
uniprot which is a  big one).  We'll be publishng live stats soon. But
what about wrappers (e.g. flickr wrappers of keyword searches), that's
a virtually unlimited source of triples.

Reminder: anyone who has a LOD dataset and would like it to be
indexed/counted can simply submit a semantic sitemap here:

http://sindice.com/main/submit      (see the sitemap box)

Processing is pretty quick usually (can be a day or 2, you get an email back)

Giovanni




On Thu, Nov 20, 2008 at 12:07 AM, Jim Hendler <hendler@cs.rpi.edu> wrote:
>
> So I've been to a number of talks lately where the size of the current (Sept
> 08 diagram) Linked Open Data cloud, in triples, has been stated - with
> numbers that vary quite widely.  The esw wiki says 2B triples as of 2007,
> which isn't very useful given the growth we've seen in the past year -- I've
> also seen the various blog posts and mail threads saying why we shouldn't
> cit meaningless numbers and such - but frankly, I've recently been on a
> bunch of panels with DB guys, and I'd love to have a reasonable number to
> quote -- anyone have a good estimate of the size of the danged thing (number
> of triples in the whole as an RDF graph would be nice) -- would also be nice
> for general audiences where big numbers tend to impress and for research
> purposes (for example, we know how far we can compress the triples for an in
> memory approach we are playing with, but we want to figure out how much
> memory we need for the whole cloud - we want to know if we need to shell out
> for the 16G iphone)
>  anyway, if anyone has a decent estimate, or even a smart educated guess,
> I'd love to hear it
>  JH
>
>
>
> "If we knew what we were doing, it wouldn't be called research, would it?."
> - Albert Einstein
>
> Prof James Hendler
>  http://www.cs.rpi.edu/~hendler
> Tetherless World Constellation Chair
> Computer Science Dept
> Rensselaer Polytechnic Institute, Troy NY 12180
>
>
>
>
Received on Thursday, 20 November 2008 00:58:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:43 UTC