AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

Hi Alan,

> I have just spent some time evaluating one source and reported to you 
> the result. Perhaps you might act on this investment in time and thank 
> me for doing so. You might find that the result was myself and more 
> people doing such quality control.

Sorry that my reply yesterday might have been a bit too harsh.

I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
and added a reference to the description of the CAS dataset at

http://ckan.net/package/bio2rdf-cas

Please also note that CKAN provides a rating function for the datasets and
also provides for commenting and discussing the datasets.

Maybe people could use these features as a start to collect quality-related
meta-information about the datasets.

CKAN also provides a link to the http://www.isitopendata.org/ service, which
might be used for license inquiries.

I agree with you that the quality of Linked Data published on the Web is
crucial, but we also have to take into account that much of the data in the
LOD cloud is currently still published by research projects in order to
demonstrate the technologies.

As the Web of Data is evolving and more and more actual owners of the
datasets start to provide them as Linked Data, I hope that the quality will
also increase and the datasets will be keep current. Encouraging
developments into this direction currently happen in the libraries,
eGovernment, and eCommerce domains. 

On the other hand, the Web is an open system and we will thus always see
people publishing low-quality, wrong and misleading data. Google handles
this fact rather successfully using PageRank. As the Web of Data provides
more structure then the classic Web, I think we might even be able to apply
more sophisticated data-quality assessment heuristics to decide which data
we want to use in our applications and which to ignore. Some of these
methods are listed in [1].

Best, 

Chris 

[1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
using the WIQA policy framework. Journal of Web Semantics: Science, Services
and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
1-10.
http://dx.doi.org/10.1016/j.websem.2008.02.005


-----Ursprüngliche Nachricht-----
Von: Alan Ruttenberg [mailto:alanruttenberg@gmail.com] 
Gesendet: Samstag, 4. September 2010 22:20
An: Chris Bizer
Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray
Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
that your dataset is included.

On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer <chris@bizer.de> wrote:
> So rather than to criticize the work that other people do on collecting
> meta-information about the datasets in the LOD cloud

Did you read what I wrote? I made no comment on the adequacy of
metainformation. In fact I *used* that metainformation to point out
that the data source in question did not satisfy the "open" provision
of linked *open* data. In addition I criticized the *inclusion* of the
data set in the *lod cloud diagram* because of this lack of openness
and because the actual content of that resource didn't resemble any
data in the resource that it was derived from (a registry of
information about chemical compounds), suggesting that it would hurt
the LOD effort as inclusion would be a kind of "false advertising".

-Alan

Received on Sunday, 5 September 2010 09:09:13 UTC