- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Sat, 22 Nov 2008 01:13:37 +0000
- To: "Tom Heath" <tom.heath@talis.com>
- Cc: "Jim Hendler" <hendler@cs.rpi.edu>, "Michael Hausenblas" <michael.hausenblas@deri.org>, public-lod@w3.org
Well, when a sitemap is submitted the dataset is usually counter right away and with no crawling uncertaininty. e.g. cycorp submitting theirs yesterday, 130k linked data documents indexed today http://sindice.com/search?q=opencyc&qt=term I'll try to get some daily calculated stats out next week. We had this prototypical idea of a map of data built live see the sketch we had at http://sindice.com/map . Ideally the goal was to have a dynamical lod map, actually useful for crafting queries (with stats on the side) But the project will require more time. Giovanni On Fri, Nov 21, 2008 at 4:47 PM, Tom Heath <tom.heath@talis.com> wrote: > > Hi Jim, all, > > At WWW2008 ChrisB and I approached R Guha to ask if Google could apply > some of their considerable resources to answering this question. The > response went something like "sure, we can do that, email me", but > since then we've been unable to get any further responses. Perhaps you > have a stronger connection there and could nudge that? > > Alternatively, perhaps Yahoo or the Falcon-S guys could help out, as > they seem to have a pretty comprehensive crawl, or maybe SWSE could. > Surely there's some kudos to be had in being the de facto authority on > the size of the Web of Data, at least for a few months/years yet. > > I agree, size does matter. Time for another single function web site > at howbigisthewebofdata.com? ;) > > Tom. > > > 2008/11/20 Jim Hendler <hendler@cs.rpi.edu>: >> I guess I asked the question wrong - the linked open data project currently >> identifies a specific set of dat resources that are linked together - so >> thie "entity" is definable - I didn't mean to ask how big the whole >> Semantic Web is - I meant how many triples are in this particular group - >> the set that are described on >> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData >> I've been able to download pictures of this graph every few months or so, >> and you can see the number of datasets growing, but the last published >> number of triples for the thing (as stated on that page) is from over a year >> ago, and a whole bunch of stuff has been added and some of these have grown >> a lot - so we have a publicly shared, large-scale, RDF data resource that >> can be used for benchmarking, trying different interfaces and new >> technologies, etc >> So it would be really nice to get a number every now and then so we could >> plot growth, explain to people what is in it better, etc. >> I know, I know, I know all the technical reasons this is relatively >> meaningless, but I gotta tell you, when I hear someone say "20 billion >> triples," I can tell you it it causes people to pay attention -- problem is >> I would like to use a number that has some validity before I start quoting >> it.... >> >> On Nov 20, 2008, at 5:12 AM, Michael Hausenblas wrote: >> >>> My 2c in order to capture this for others as well: >>> >>> http://community.linkeddata.org/MediaWiki/index.php?HowBigIsTheDangedThing >>> >>> Cheers, >>> Michael >>> >>> ---------------------------------------------------------- >>> Dr. Michael Hausenblas >>> DERI - Digital Enterprise Research Institute >>> National University of Ireland, Lower Dangan, >>> Galway, Ireland >>> ---------------------------------------------------------- >>> >>> Jim Hendler wrote: >>>> >>>> So I've been to a number of talks lately where the size of the current >>>> (Sept 08 diagram) Linked Open Data cloud, in triples, has been stated - with >>>> numbers that vary quite widely. The esw wiki says 2B triples as of 2007, >>>> which isn't very useful given the growth we've seen in the past year -- I've >>>> also seen the various blog posts and mail threads saying why we shouldn't >>>> cit meaningless numbers and such - but frankly, I've recently been on a >>>> bunch of panels with DB guys, and I'd love to have a reasonable number to >>>> quote -- anyone have a good estimate of the size of the danged thing (number >>>> of triples in the whole as an RDF graph would be nice) -- would also be nice >>>> for general audiences where big numbers tend to impress and for research >>>> purposes (for example, we know how far we can compress the triples for an in >>>> memory approach we are playing with, but we want to figure out how much >>>> memory we need for the whole cloud - we want to know if we need to shell out >>>> for the 16G iphone) >>>> anyway, if anyone has a decent estimate, or even a smart educated guess, >>>> I'd love to hear it >>>> JH >>>> "If we knew what we were doing, it wouldn't be called research, would >>>> it?." - Albert Einstein >>>> Prof James Hendler http://www.cs.rpi.edu/~hendler >>>> Tetherless World Constellation Chair >>>> Computer Science Dept >>>> Rensselaer Polytechnic Institute, Troy NY 12180 >> >> "If we knew what we were doing, it wouldn't be called research, would it?." >> - Albert Einstein >> >> Prof James Hendler >> http://www.cs.rpi.edu/~hendler >> Tetherless World Constellation Chair >> Computer Science Dept >> Rensselaer Polytechnic Institute, Troy NY 12180 >> >> Find out more about Talis at www.talis.com >> Shared InnovationTM >> >> >> Any views or personal opinions expressed within this email may not be those >> of Talis Information Ltd. The content of this email message and any files >> that may be attached are confidential, and for the usage of the intended >> recipient only. If you are not the intended recipient, then please return >> this message to the sender and delete it. Any use of this e-mail by an >> unauthorised recipient is prohibited. >> >> >> Talis Information Ltd is a member of the Talis Group of companies and is >> registered in England No 3638278 with its registered office at Knights >> Court, Solihull Parkway, Birmingham Business Park, B37 7YB. >> >> ______________________________________________________________________ >> This email has been scanned by the MessageLabs Email Security System. >> For more information please visit >> http://www.messagelabs.com/email______________________________________________________________________ >> > > > > -- > Dr Tom Heath > Researcher > Platform Division > Talis Information Ltd > T: 0870 400 5000 > W: http://www.talis.com/ >
Received on Saturday, 22 November 2008 01:14:22 UTC