W3C home > Mailing lists > Public > public-lod@w3.org > March 2009

Re: New LOD Cloud - Please send us links to missing data sources

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sun, 1 Mar 2009 00:06:33 +0000
To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, Anja Jentzsch <anja@anjeve.de>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <C5CF8209.29CBA%hg@ecs.soton.ac.uk>
Hi Ted,
I think I agree.

On 28/02/2009 22:24, "Ted Thibodeau Jr" <tthibodeau@openlinksw.com> wrote:

> Hi, Anja --
> On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote:
>> we are currently updating the LOD cloud. Find the draft here:
>> <http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-02-27.png
> It remains very pretty -- but it feels like a data silo of its own.
> I can't speak for anyone else, but I find it difficult to read this
> cloud -- I cannot see which data sets do well at out-linking, nor
> which sets are apparently good for being out-linked *to*.
> Where is the data behind the graph?
> What are the triple counts for each set?
> What are counts for each arrow?  (When the arrow is bidirectional,
> I would expect 2 counts, 1 for each arrowhead.)
So if you look at
You will find exactly what you ask for as tootips (in fact more so than your
nice picture, perhaps? :-) Sorry, shouldn't have said that.)
And yes, of course it doesn't look as nice either.
The raw data (in fact the dot file that generates it) is at
(I have put the .txt so it doesn't download as a Word file).
> In October, when I asked about these, such numbers weren't used in
> drawing the cloud -- so the sizes of the nodes and the weights of
> the arcs are just artistic, and/or based on gut feel.  This use of
> commonly understood graphing techniques (larger nodes for larger
> data sets, thicker arcs for more links between) without any actual
> data behind it troubles me.
> Because of this, and because I wanted to easily see which data sets
> made many links out, and which data sets were linked *to* a lot,
> I made an alternative graphic -- which doesn't look so pretty (it's
> much less suggestive of an actual cloud), but which I think is rather
> more readable, and has no potential misinterpretation about data set
> sizes or inter-link intensity, as all nodes are the same size and all
> arcs are the same weight.
>     <http://virtuoso.openlinksw.com/images/dbpedia-lod-cloud.html>
> This version makes plain that there are some clusters within the
> "cloud" -- and that DBpedia, MusicBrainz, and GeoNames are clearly
> nexuses for these clusters, being the targets of many inter-links.
> The new edition reveals several new nexuses, in ECS Southampton,
> GeneID, UniProt, DBLP RKB Explorer, CiteSeer, and others.
> At first glance, FOAF appears to be such a nexus, but it is an
> odd case.
> On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote:
>> Keep in mind: A data source qualifies for the cloud, if the
>> data is available via dereferencable URIs and if the data
>> source is interlinked with at least one other source (meaning
>> it references URIs within the namespace of the other source).
> I don't think FOAF properly belongs on a cloud of *data sources*,
> because it's more of a vocabulary than a data-set -- more
> analogous to Dublic Core than to the others here.  In other words,
> it's more *data dictionary* than *instance data*.
> There is no FOAF SPARQL endpoint, no FOAF RDF dump, etc., in
> marked contrast to the others on this graphic.  The FOAF ontology
> is used by *many* data sets which aren't represented here -- which
> is why I used a cloud for FOAF (and for SIOC).  *Most* uses of
> FOAF are one-off pages, and I don't think those can really be
> considered "data sources" never mind "data sets" for purposes of
> this diagram.
> My intent (as time permits) is to build separate illustrations
> of the data sets that make use of these vocabularies (e.g.,
> LiveJournal, Facebook, Tribe [assuming revival], etc.).
> Obviously, there are many more data set inter-links in the updated
> cloud than the previous edition, so there will be more intersections
> in mine, too.  But if you compare mine to the edition it was based
> on, you'll see that there's a lot less inter-linking in reality
> than a casual glance at this original suggests --
> <http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2008-09-18_blank.png
> I think exposing where the gaps are is *much* more likely to lead to
> them being closed, than making them invisible.
> I'd be happy to update mine -- and happier still to make the node sizes
> and arc weights proportional to the data -- but I can no longer reverse
> engineer the connections simply by looking at the cloud image.
> Can we get the raw data, please?
> Be seeing you,
> Ted
> --
> A: Yes.                      http://www.guckes.net/faq/attribution.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>                                   http://www.openlinksw.com/weblogs/uda/
> OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
>                                 http://www.openlinksw.com/blog/~kidehen/
>      Universal Data Access and Virtual Database Technology Providers
Received on Sunday, 1 March 2009 00:07:26 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:55 UTC