Re: New LOD Cloud - Please send us links to missing data sources from Hugh Glaser on 2009-03-01 (public-lod@w3.org from March 2009)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sun, 1 Mar 2009 00:06:33 +0000
To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, Anja Jentzsch <anja@anjeve.de>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <C5CF8209.29CBA%hg@ecs.soton.ac.uk>
Hi Ted,
I think I agree.

On 28/02/2009 22:24, "Ted Thibodeau Jr" <tthibodeau@openlinksw.com> wrote:

> Hi, Anja --
>
> On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote:
>> we are currently updating the LOD cloud. Find the draft here:
>> <http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-02-27.png
>>
>
> It remains very pretty -- but it feels like a data silo of its own.
>
> I can't speak for anyone else, but I find it difficult to read this
> cloud -- I cannot see which data sets do well at out-linking, nor
> which sets are apparently good for being out-linked *to*.
>
> Where is the data behind the graph?
>
> What are the triple counts for each set?
>
> What are counts for each arrow?  (When the arrow is bidirectional,
> I would expect 2 counts, 1 for each arrowhead.)
So if you look at
http://www.rkbexplorer.com/linkage/crs-linkage-neat.svg
You will find exactly what you ask for as tootips (in fact more so than your
nice picture, perhaps? :-) Sorry, shouldn't have said that.)
And yes, of course it doesn't look as nice either.
The raw data (in fact the dot file that generates it) is at
http://users.ecs.soton.ac.uk/hg/graphviz.dot.txt
(I have put the .txt so it doesn't download as a Word file).
Best
Hugh
>
> In October, when I asked about these, such numbers weren't used in
> drawing the cloud -- so the sizes of the nodes and the weights of
> the arcs are just artistic, and/or based on gut feel.  This use of
> commonly understood graphing techniques (larger nodes for larger
> data sets, thicker arcs for more links between) without any actual
> data behind it troubles me.
>
> Because of this, and because I wanted to easily see which data sets
> made many links out, and which data sets were linked *to* a lot,
> I made an alternative graphic -- which doesn't look so pretty (it's
> much less suggestive of an actual cloud), but which I think is rather
> more readable, and has no potential misinterpretation about data set
> sizes or inter-link intensity, as all nodes are the same size and all
> arcs are the same weight.
>
>     <http://virtuoso.openlinksw.com/images/dbpedia-lod-cloud.html>
>
> This version makes plain that there are some clusters within the
> "cloud" -- and that DBpedia, MusicBrainz, and GeoNames are clearly
> nexuses for these clusters, being the targets of many inter-links.
> The new edition reveals several new nexuses, in ECS Southampton,
> GeneID, UniProt, DBLP RKB Explorer, CiteSeer, and others.
>
> At first glance, FOAF appears to be such a nexus, but it is an
> odd case.
>
> On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote:
>> Keep in mind: A data source qualifies for the cloud, if the
>> data is available via dereferencable URIs and if the data
>> source is interlinked with at least one other source (meaning
>> it references URIs within the namespace of the other source).
>
> I don't think FOAF properly belongs on a cloud of *data sources*,
> because it's more of a vocabulary than a data-set -- more
> analogous to Dublic Core than to the others here.  In other words,
> it's more *data dictionary* than *instance data*.
>
> There is no FOAF SPARQL endpoint, no FOAF RDF dump, etc., in
> marked contrast to the others on this graphic.  The FOAF ontology
> is used by *many* data sets which aren't represented here -- which
> is why I used a cloud for FOAF (and for SIOC).  *Most* uses of
> FOAF are one-off pages, and I don't think those can really be
> considered "data sources" never mind "data sets" for purposes of
> this diagram.
>
> My intent (as time permits) is to build separate illustrations
> of the data sets that make use of these vocabularies (e.g.,
> LiveJournal, Facebook, Tribe [assuming revival], etc.).
>
> Obviously, there are many more data set inter-links in the updated
> cloud than the previous edition, so there will be more intersections
> in mine, too.  But if you compare mine to the edition it was based
> on, you'll see that there's a lot less inter-linking in reality
> than a casual glance at this original suggests --
>
>
> <http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2008-09-18_blank.png
>>
>
> I think exposing where the gaps are is *much* more likely to lead to
> them being closed, than making them invisible.
>
> I'd be happy to update mine -- and happier still to make the node sizes
> and arc weights proportional to the data -- but I can no longer reverse
> engineer the connections simply by looking at the cloud image.
>
> Can we get the raw data, please?
>
> Be seeing you,
>
> Ted
>
>
>
> --
> A: Yes.                      http://www.guckes.net/faq/attribution.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
>
> Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
> Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>                                   http://www.openlinksw.com/weblogs/uda/
> OpenLink Blogs              http://www.openlinksw.com/weblogs/virtuoso/
>                                 http://www.openlinksw.com/blog/~kidehen/
>      Universal Data Access and Virtual Database Technology Providers
>
>
>
>
>
Received on Sunday, 1 March 2009 00:07:26 UTC