- From: Yves Raimond <yves.raimond@gmail.com>
- Date: Fri, 4 Apr 2008 15:43:34 +0100
- To: "Chris Bizer" <chris@bizer.de>
- Cc: public-lod@w3.org
Hello! > Currently we are having an estimate of 2 billion INTERLINKED triples. > > The question is now, how do we count sparcely connected data sources like > the MySpace or AudioScrobbler wrappers which could potentially provide wast > amounts of RDF but where most of it can currently not be found by RDF > crawlers and browsers as it is not interlinked from other sources? > > The same question applies to our RDF Book Mashup that wraps the Amazon book > database. > > I guess an OKish heuristic could be: Count all triples that descibe > resources that have at least one RDF link pointing at them. > > Yves: Any idea how your figures change when this rule is applied? Yes, this is indeed a tricky question - whereas we provide links from Jamendo/Magnatune/etc. to other data sources (I would count AudioScrobbler as non-sparsely interlinked dataset though, as it provides links to Musicbrainz), it is difficult to guess how much links point to the MySpace data, especially as the "target" audience is also sparse (mainly FOAF users). I know a couple of people who put foaf:interest, owl:sameAs or foaf:knows towards identifiers in MySpace, but I really don't have a clue how to quantify that :( I also know a couple of people who solely use the MySpace RDF service to get access to the audio without having to stand horrible looking myspace web pages (me, for example) :) Cheers! y
Received on Friday, 4 April 2008 14:44:13 UTC