Re: 13.1 billion triples from Yves Raimond on 2008-04-04 (public-lod@w3.org from April 2008)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Fri, 4 Apr 2008 15:43:34 +0100
To: "Chris Bizer" <chris@bizer.de>
Cc: public-lod@w3.org
Message-ID: <82593ac00804040743n706f3b7bsc587aea3fc8199a@mail.gmail.com>

Hello!

>  Currently we are having an estimate of 2 billion INTERLINKED triples.
>
>  The question is now, how do we count sparcely connected data sources like
> the MySpace or AudioScrobbler wrappers which could potentially provide wast
> amounts of RDF but where most of it can currently not be found by RDF
> crawlers and browsers as it is not interlinked from other sources?
>
>  The same question applies to our RDF Book Mashup that wraps the Amazon book
> database.
>
>  I guess an OKish heuristic could be: Count all triples that descibe
> resources that have at least one RDF link pointing at them.
>
>  Yves: Any idea how your figures change when this rule is applied?

Yes, this is indeed a tricky question - whereas we provide links from
Jamendo/Magnatune/etc. to other data sources (I would count
AudioScrobbler as non-sparsely interlinked dataset though, as it
provides links to Musicbrainz), it is difficult to guess how much
links point to the MySpace data, especially as the "target" audience
is also sparse (mainly FOAF users).
I know a couple of people who put foaf:interest, owl:sameAs or
foaf:knows towards identifiers in MySpace, but I really don't have a
clue how to quantify that :(
I also know a couple of people who solely use the MySpace RDF service
to get access to the audio without having to stand horrible looking
myspace web pages (me, for example) :)

Cheers!
y

Received on Friday, 4 April 2008 14:44:13 UTC