- From: Hugh Glaser <hg@ecs.soton.ac.uk>
- Date: Sun, 1 Mar 2009 00:06:33 +0000
- To: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, Anja Jentzsch <anja@anjeve.de>
- CC: "public-lod@w3.org" <public-lod@w3.org>
Hi Ted, I think I agree. On 28/02/2009 22:24, "Ted Thibodeau Jr" <tthibodeau@openlinksw.com> wrote: > Hi, Anja -- > > On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote: >> we are currently updating the LOD cloud. Find the draft here: >> <http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-02-27.png >> > > It remains very pretty -- but it feels like a data silo of its own. > > I can't speak for anyone else, but I find it difficult to read this > cloud -- I cannot see which data sets do well at out-linking, nor > which sets are apparently good for being out-linked *to*. > > Where is the data behind the graph? > > What are the triple counts for each set? > > What are counts for each arrow? (When the arrow is bidirectional, > I would expect 2 counts, 1 for each arrowhead.) So if you look at http://www.rkbexplorer.com/linkage/crs-linkage-neat.svg You will find exactly what you ask for as tootips (in fact more so than your nice picture, perhaps? :-) Sorry, shouldn't have said that.) And yes, of course it doesn't look as nice either. The raw data (in fact the dot file that generates it) is at http://users.ecs.soton.ac.uk/hg/graphviz.dot.txt (I have put the .txt so it doesn't download as a Word file). Best Hugh > > In October, when I asked about these, such numbers weren't used in > drawing the cloud -- so the sizes of the nodes and the weights of > the arcs are just artistic, and/or based on gut feel. This use of > commonly understood graphing techniques (larger nodes for larger > data sets, thicker arcs for more links between) without any actual > data behind it troubles me. > > Because of this, and because I wanted to easily see which data sets > made many links out, and which data sets were linked *to* a lot, > I made an alternative graphic -- which doesn't look so pretty (it's > much less suggestive of an actual cloud), but which I think is rather > more readable, and has no potential misinterpretation about data set > sizes or inter-link intensity, as all nodes are the same size and all > arcs are the same weight. > > <http://virtuoso.openlinksw.com/images/dbpedia-lod-cloud.html> > > This version makes plain that there are some clusters within the > "cloud" -- and that DBpedia, MusicBrainz, and GeoNames are clearly > nexuses for these clusters, being the targets of many inter-links. > The new edition reveals several new nexuses, in ECS Southampton, > GeneID, UniProt, DBLP RKB Explorer, CiteSeer, and others. > > At first glance, FOAF appears to be such a nexus, but it is an > odd case. > > On Feb 27, 2009, at 05:58 PM, Anja Jentzsch wrote: >> Keep in mind: A data source qualifies for the cloud, if the >> data is available via dereferencable URIs and if the data >> source is interlinked with at least one other source (meaning >> it references URIs within the namespace of the other source). > > I don't think FOAF properly belongs on a cloud of *data sources*, > because it's more of a vocabulary than a data-set -- more > analogous to Dublic Core than to the others here. In other words, > it's more *data dictionary* than *instance data*. > > There is no FOAF SPARQL endpoint, no FOAF RDF dump, etc., in > marked contrast to the others on this graphic. The FOAF ontology > is used by *many* data sets which aren't represented here -- which > is why I used a cloud for FOAF (and for SIOC). *Most* uses of > FOAF are one-off pages, and I don't think those can really be > considered "data sources" never mind "data sets" for purposes of > this diagram. > > My intent (as time permits) is to build separate illustrations > of the data sets that make use of these vocabularies (e.g., > LiveJournal, Facebook, Tribe [assuming revival], etc.). > > Obviously, there are many more data set inter-links in the updated > cloud than the previous edition, so there will be more intersections > in mine, too. But if you compare mine to the edition it was based > on, you'll see that there's a lot less inter-linking in reality > than a casual glance at this original suggests -- > > > <http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2008-09-18_blank.png >> > > I think exposing where the gaps are is *much* more likely to lead to > them being closed, than making them invisible. > > I'd be happy to update mine -- and happier still to make the node sizes > and arc weights proportional to the data -- but I can no longer reverse > engineer the connections simply by looking at the cloud image. > > Can we get the raw data, please? > > Be seeing you, > > Ted > > > > -- > A: Yes. http://www.guckes.net/faq/attribution.html > | Q: Are you sure? > | | A: Because it reverses the logical flow of conversation. > | | | Q: Why is top posting frowned upon? > > Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 > Evangelism & Support // mailto:tthibodeau@openlinksw.com > OpenLink Software, Inc. // http://www.openlinksw.com/ > http://www.openlinksw.com/weblogs/uda/ > OpenLink Blogs http://www.openlinksw.com/weblogs/virtuoso/ > http://www.openlinksw.com/blog/~kidehen/ > Universal Data Access and Virtual Database Technology Providers > > > > >
Received on Sunday, 1 March 2009 00:07:26 UTC