W3C home > Mailing lists > Public > public-lod@w3.org > November 2008

Re: Size matters -- How big is the danged thing

From: Jim Hendler <hendler@cs.rpi.edu>
Date: Thu, 20 Nov 2008 06:03:25 -0500
Cc: public-lod@w3.org
Message-Id: <AA2E36DC-6F82-449F-815A-8F4F3A70D1D0@cs.rpi.edu>
To: Michael Hausenblas <michael.hausenblas@deri.org>

I guess I asked the question wrong - the linked open data project  
currently identifies a specific set of dat resources that are linked  
together - so thie "entity" is definable - I didn't mean to  ask how  
big the whole Semantic Web is - I meant how many triples are in this  
particular group - the set that are described on http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
I've been able to download pictures of this graph every few months or  
so, and you can see the number of datasets growing, but the last  
published number of triples for the thing (as stated on that page) is  
from over a year ago, and a whole bunch of stuff has been added and  
some of these have grown a lot - so we have a publicly shared, large- 
scale, RDF data resource that can be used for benchmarking, trying  
different interfaces and new technologies, etc
So it would be really nice to get a number every now and then so we  
could plot growth, explain to people what is in it better, etc.
I know, I know, I know all the technical reasons this is relatively  
meaningless, but I gotta tell you, when I hear someone say "20 billion  
triples," I can tell you it it causes people to pay attention --  
problem is I would like to use a number that has some validity before  
I start quoting it....

On Nov 20, 2008, at 5:12 AM, Michael Hausenblas wrote:

> My 2c in order to capture this for others as well:
>
> http://community.linkeddata.org/MediaWiki/index.php?HowBigIsTheDangedThing
>
> Cheers,
> 	Michael
>
> ----------------------------------------------------------
> Dr. Michael Hausenblas
> DERI - Digital Enterprise Research Institute
> National University of Ireland, Lower Dangan,
> Galway, Ireland
> ----------------------------------------------------------
>
> Jim Hendler wrote:
>> So I've been to a number of talks lately where the size of the  
>> current (Sept 08 diagram) Linked Open Data cloud, in triples, has  
>> been stated - with numbers that vary quite widely.  The esw wiki  
>> says 2B triples as of 2007, which isn't very useful given the  
>> growth we've seen in the past year -- I've also seen the various  
>> blog posts and mail threads saying why we shouldn't cit meaningless  
>> numbers and such - but frankly, I've recently been on a bunch of  
>> panels with DB guys, and I'd love to have a reasonable number to  
>> quote -- anyone have a good estimate of the size of the danged  
>> thing (number of triples in the whole as an RDF graph would be  
>> nice) -- would also be nice for general audiences where big numbers  
>> tend to impress and for research purposes (for example, we know how  
>> far we can compress the triples for an in memory approach we are  
>> playing with, but we want to figure out how much memory we need for  
>> the whole cloud - we want to know if we need to shell out for the  
>> 16G iphone)
>> anyway, if anyone has a decent estimate, or even a smart educated  
>> guess, I'd love to hear it
>> JH
>> "If we knew what we were doing, it wouldn't be called research,  
>> would it?." - Albert Einstein
>> Prof James Hendler                http://www.cs.rpi.edu/~hendler
>> Tetherless World Constellation Chair
>> Computer Science Dept
>> Rensselaer Polytechnic Institute, Troy NY 12180

"If we knew what we were doing, it wouldn't be called research, would  
it?." - Albert Einstein

Prof James Hendler				http://www.cs.rpi.edu/~hendler
Tetherless World Constellation Chair
Computer Science Dept
Rensselaer Polytechnic Institute, Troy NY 12180
Received on Thursday, 20 November 2008 23:10:37 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:43 UTC