W3C home > Mailing lists > Public > public-lod@w3.org > November 2008

Re: Size matters -- How big is the danged thing

From: Yves Raimond <yves.raimond@gmail.com>
Date: Fri, 21 Nov 2008 17:01:11 +0000
Message-ID: <82593ac00811210901q56bdd6dch672122cf46746587@mail.gmail.com>
To: "Jim Hendler" <hendler@cs.rpi.edu>
Cc: "Michael Hausenblas" <michael.hausenblas@deri.org>, public-lod@w3.org

Hello!

> I guess I asked the question wrong - the linked open data project currently
> identifies a specific set of dat resources that are linked together - so
> thie "entity" is definable - I didn't mean to  ask how big the whole
> Semantic Web is - I meant how many triples are in this particular group -
> the set that are described on
> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

Here are some stats, updated from a paper we wrote with Tom, Michael
and Wolfgang [1]. It doesn't include all of the datasets added in the
last revision of the diagram though (it lacks LinkedMDB, for example).
http://moustaki.org/resources/lod-stats.png

(sorry for the png, I ll upload that in a handier format soonish).

\mu is just the size of the dataset in triples.
\nu is the |L| * 100 / mu , where L is the set of triples linking to
an external dataset..

Overall, that's about 17 billion.

Cheers!
y

[1] http://sw-app.org/pub/isemantics08-sotsw.pdf
> I've been able to download pictures of this graph every few months or so,
> and you can see the number of datasets growing, but the last published
> number of triples for the thing (as stated on that page) is from over a year
> ago, and a whole bunch of stuff has been added and some of these have grown
> a lot - so we have a publicly shared, large-scale, RDF data resource that
> can be used for benchmarking, trying different interfaces and new
> technologies, etc
> So it would be really nice to get a number every now and then so we could
> plot growth, explain to people what is in it better, etc.
> I know, I know, I know all the technical reasons this is relatively
> meaningless, but I gotta tell you, when I hear someone say "20 billion
> triples," I can tell you it it causes people to pay attention -- problem is
> I would like to use a number that has some validity before I start quoting
> it....
>
> On Nov 20, 2008, at 5:12 AM, Michael Hausenblas wrote:
>
>> My 2c in order to capture this for others as well:
>>
>> http://community.linkeddata.org/MediaWiki/index.php?HowBigIsTheDangedThing
>>
>> Cheers,
>>        Michael
>>
>> ----------------------------------------------------------
>> Dr. Michael Hausenblas
>> DERI - Digital Enterprise Research Institute
>> National University of Ireland, Lower Dangan,
>> Galway, Ireland
>> ----------------------------------------------------------
>>
>> Jim Hendler wrote:
>>>
>>> So I've been to a number of talks lately where the size of the current
>>> (Sept 08 diagram) Linked Open Data cloud, in triples, has been stated - with
>>> numbers that vary quite widely.  The esw wiki says 2B triples as of 2007,
>>> which isn't very useful given the growth we've seen in the past year -- I've
>>> also seen the various blog posts and mail threads saying why we shouldn't
>>> cit meaningless numbers and such - but frankly, I've recently been on a
>>> bunch of panels with DB guys, and I'd love to have a reasonable number to
>>> quote -- anyone have a good estimate of the size of the danged thing (number
>>> of triples in the whole as an RDF graph would be nice) -- would also be nice
>>> for general audiences where big numbers tend to impress and for research
>>> purposes (for example, we know how far we can compress the triples for an in
>>> memory approach we are playing with, but we want to figure out how much
>>> memory we need for the whole cloud - we want to know if we need to shell out
>>> for the 16G iphone)
>>> anyway, if anyone has a decent estimate, or even a smart educated guess,
>>> I'd love to hear it
>>> JH
>>> "If we knew what we were doing, it wouldn't be called research, would
>>> it?." - Albert Einstein
>>> Prof James Hendler                http://www.cs.rpi.edu/~hendler
>>> Tetherless World Constellation Chair
>>> Computer Science Dept
>>> Rensselaer Polytechnic Institute, Troy NY 12180
>
> "If we knew what we were doing, it wouldn't be called research, would it?."
> - Albert Einstein
>
> Prof James Hendler
>  http://www.cs.rpi.edu/~hendler
> Tetherless World Constellation Chair
> Computer Science Dept
> Rensselaer Polytechnic Institute, Troy NY 12180
>
>
>
>
>
>
Received on Friday, 21 November 2008 17:01:47 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:43 UTC