W3C home > Mailing lists > Public > public-lod@w3.org > November 2008

Re: Size matters -- How big is the danged thing

From: Tom Heath <tom.heath@talis.com>
Date: Fri, 21 Nov 2008 16:47:42 +0000
Message-ID: <89f622f10811210847o1449402di4c6d17eff1bf594b@mail.gmail.com>
To: "Jim Hendler" <hendler@cs.rpi.edu>
Cc: "Michael Hausenblas" <michael.hausenblas@deri.org>, public-lod@w3.org

Hi Jim, all,

At WWW2008 ChrisB and I approached R Guha to ask if Google could apply
some of their considerable resources to answering this question. The
response went something like "sure, we can do that, email me", but
since then we've been unable to get any further responses. Perhaps you
have a stronger connection there and could nudge that?

Alternatively, perhaps Yahoo or the Falcon-S guys could help out, as
they seem to have a pretty comprehensive crawl, or maybe SWSE could.
Surely there's some kudos to be had in being the de facto authority on
the size of the Web of Data, at least for a few months/years yet.

I agree, size does matter. Time for another single function web site
at howbigisthewebofdata.com? ;)

Tom.


2008/11/20 Jim Hendler <hendler@cs.rpi.edu>:
> I guess I asked the question wrong - the linked open data project currently
> identifies a specific set of dat resources that are linked together - so
> thie "entity" is definable - I didn't mean to  ask how big the whole
> Semantic Web is - I meant how many triples are in this particular group -
> the set that are described on
> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
> I've been able to download pictures of this graph every few months or so,
> and you can see the number of datasets growing, but the last published
> number of triples for the thing (as stated on that page) is from over a year
> ago, and a whole bunch of stuff has been added and some of these have grown
> a lot - so we have a publicly shared, large-scale, RDF data resource that
> can be used for benchmarking, trying different interfaces and new
> technologies, etc
> So it would be really nice to get a number every now and then so we could
> plot growth, explain to people what is in it better, etc.
> I know, I know, I know all the technical reasons this is relatively
> meaningless, but I gotta tell you, when I hear someone say "20 billion
> triples," I can tell you it it causes people to pay attention -- problem is
> I would like to use a number that has some validity before I start quoting
> it....
>
> On Nov 20, 2008, at 5:12 AM, Michael Hausenblas wrote:
>
>> My 2c in order to capture this for others as well:
>>
>> http://community.linkeddata.org/MediaWiki/index.php?HowBigIsTheDangedThing
>>
>> Cheers,
>>        Michael
>>
>> ----------------------------------------------------------
>> Dr. Michael Hausenblas
>> DERI - Digital Enterprise Research Institute
>> National University of Ireland, Lower Dangan,
>> Galway, Ireland
>> ----------------------------------------------------------
>>
>> Jim Hendler wrote:
>>>
>>> So I've been to a number of talks lately where the size of the current
>>> (Sept 08 diagram) Linked Open Data cloud, in triples, has been stated - with
>>> numbers that vary quite widely.  The esw wiki says 2B triples as of 2007,
>>> which isn't very useful given the growth we've seen in the past year -- I've
>>> also seen the various blog posts and mail threads saying why we shouldn't
>>> cit meaningless numbers and such - but frankly, I've recently been on a
>>> bunch of panels with DB guys, and I'd love to have a reasonable number to
>>> quote -- anyone have a good estimate of the size of the danged thing (number
>>> of triples in the whole as an RDF graph would be nice) -- would also be nice
>>> for general audiences where big numbers tend to impress and for research
>>> purposes (for example, we know how far we can compress the triples for an in
>>> memory approach we are playing with, but we want to figure out how much
>>> memory we need for the whole cloud - we want to know if we need to shell out
>>> for the 16G iphone)
>>> anyway, if anyone has a decent estimate, or even a smart educated guess,
>>> I'd love to hear it
>>> JH
>>> "If we knew what we were doing, it wouldn't be called research, would
>>> it?." - Albert Einstein
>>> Prof James Hendler                http://www.cs.rpi.edu/~hendler
>>> Tetherless World Constellation Chair
>>> Computer Science Dept
>>> Rensselaer Polytechnic Institute, Troy NY 12180
>
> "If we knew what we were doing, it wouldn't be called research, would it?."
> - Albert Einstein
>
> Prof James Hendler
>  http://www.cs.rpi.edu/~hendler
> Tetherless World Constellation Chair
> Computer Science Dept
> Rensselaer Polytechnic Institute, Troy NY 12180
>
> Find out more about Talis at  www.talis.com
> Shared InnovationTM
>
>
> Any views or personal opinions expressed within this email may not be those
> of Talis Information Ltd. The content of this email message and any files
> that may be attached are confidential, and for the usage of the intended
> recipient only. If you are not the intended recipient, then please return
> this message to the sender and delete it. Any use of this e-mail by an
> unauthorised recipient is prohibited.
>
>
> Talis Information Ltd is a member of the Talis Group of companies and is
> registered in England No 3638278 with its registered office at Knights
> Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit
> http://www.messagelabs.com/email______________________________________________________________________
>



-- 
Dr Tom Heath
Researcher
Platform Division
Talis Information Ltd
T: 0870 400 5000
W: http://www.talis.com/
Received on Friday, 21 November 2008 16:48:19 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:43 UTC