Contd: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

  On 9/5/10 11:00 AM, Alan Ruttenberg wrote:
> On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizer<chris@bizer.de>  wrote:
>> Hi Alan,
>>
>>> I have just spent some time evaluating one source and reported to you
>>> the result. Perhaps you might act on this investment in time and thank
>>> me for doing so. You might find that the result was myself and more
>>> people doing such quality control.
>> Sorry that my reply yesterday might have been a bit too harsh.
>>
>> I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
>> and added a reference to the description of the CAS dataset at
>>
>> http://ckan.net/package/bio2rdf-cas
>>
>> Please also note that CKAN provides a rating function for the datasets and
>> also provides for commenting and discussing the datasets.
>>
>> Maybe people could use these features as a start to collect quality-related
>> meta-information about the datasets.
>>
>> CKAN also provides a link to the http://www.isitopendata.org/ service, which
>> might be used for license inquiries.
> Dear Chris,
>
> As I said, the first line on the CKAN home page says: "CKAN is a
> registry of open data and content packages.". Therefore I think there
> is a reasonable expectation that the packages registered there are
> open. I maintain that CKAN should either change how it explains itself
> to make clear that it is a registry of packages that may or may not be
> open, or it should remove the packages that are not known to be open.
> I'm not taking a position one way or another which they should do
> (that's their business), but they should say what they do, and do what
> they say.
>
> Thank you for your pointers to further information on how to find
> licenses. I'm fairly familiar with this area given that I work for
> Creative Commons.
>
>> I agree with you that the quality of Linked Data published on the Web is
>> crucial, but we also have to take into account that much of the data in the
>> LOD cloud is currently still published by research projects in order to
>> demonstrate the technologies.
>>
>> As the Web of Data is evolving and more and more actual owners of the
>> datasets start to provide them as Linked Data, I hope that the quality will
>> also increase and the datasets will be keep current. Encouraging
>> developments into this direction currently happen in the libraries,
>> eGovernment, and eCommerce domains.
> I agree that these are good examples. I would suggest that you focus
> on including the good examples in the LOD cloud, or at a minimum
> remove those, like CAS, that fall below the minimal standard of
> supplying *some* data and being *open*, so that "linked open data"
> means something coherent.
>
>> On the other hand, the Web is an open system and we will thus always see
>> people publishing low-quality, wrong and misleading data. Google handles
>> this fact rather successfully using PageRank. As the Web of Data provides
>> more structure then the classic Web, I think we might even be able to apply
>> more sophisticated data-quality assessment heuristics to decide which data
>> we want to use in our applications and which to ignore. Some of these
>> methods are listed in [1].
> Look, Chris, I just did a "manual page rank" on the CAS dataset. It is
> meaningless.  This is a high quality assessment. If the movement can't
> act on known good quality information I (and others) will doubt that
> automatic algorithms will be credible.
>
> Moreover, the LOD cloud diagram is an advertisement. There are enough
> data sets now that inclusion in the diagram can become a reward for
> good work. It's not good advertising for Google when junk sites come
> up at the top of search results and they do their best to minimize
> this occurrence. The LOD cloud is your front page, and to a certain
> extent mine as well as I invest all my time in doing work towards
> building the web of data in the Sciences.
>
> Regards,
> Alan
>
>> Best,
>>
>> Chris
>>
>> [1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
>> using the WIQA policy framework. Journal of Web Semantics: Science, Services
>> and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
>> 1-10.
>> http://dx.doi.org/10.1016/j.websem.2008.02.005
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Alan Ruttenberg [mailto:alanruttenberg@gmail.com]
>> Gesendet: Samstag, 4. September 2010 22:20
>> An: Chris Bizer
>> Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray
>> Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
>> that your dataset is included.
>>
>> On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer<chris@bizer.de>  wrote:
>>> So rather than to criticize the work that other people do on collecting
>>> meta-information about the datasets in the LOD cloud
>> Did you read what I wrote? I made no comment on the adequacy of
>> metainformation. In fact I *used* that metainformation to point out
>> that the data source in question did not satisfy the "open" provision
>> of linked *open* data. In addition I criticized the *inclusion* of the
>> data set in the *lod cloud diagram* because of this lack of openness
>> and because the actual content of that resource didn't resemble any
>> data in the resource that it was derived from (a registry of
>> information about chemical compounds), suggesting that it would hurt
>> the LOD effort as inclusion would be a kind of "false advertising".
>>
>> -Alan
>>
>>
>
All,

See: http://www.ckan.net/group/lod

Why: Linking Open Data?

It should be: Linked Open Data . Or just: Linked Data (bearing in all 
the data sets might not be open).


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Sunday, 5 September 2010 16:25:38 UTC