- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Sun, 05 Sep 2010 12:25:07 -0400
- To: Alan Ruttenberg <alanruttenberg@gmail.com>
- CC: Chris Bizer <chris@bizer.de>, Anja Jentzsch <anja@anjeve.de>, public-lod@w3.org, Leigh Dodds <leigh.dodds@talis.com>, Jonathan Gray <jonathan.gray@okfn.org>, info@okfn.org
- Message-ID: <4C83C463.60003@openlinksw.com>
On 9/5/10 11:00 AM, Alan Ruttenberg wrote: > On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizer<chris@bizer.de> wrote: >> Hi Alan, >> >>> I have just spent some time evaluating one source and reported to you >>> the result. Perhaps you might act on this investment in time and thank >>> me for doing so. You might find that the result was myself and more >>> people doing such quality control. >> Sorry that my reply yesterday might have been a bit too harsh. >> >> I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html) >> and added a reference to the description of the CAS dataset at >> >> http://ckan.net/package/bio2rdf-cas >> >> Please also note that CKAN provides a rating function for the datasets and >> also provides for commenting and discussing the datasets. >> >> Maybe people could use these features as a start to collect quality-related >> meta-information about the datasets. >> >> CKAN also provides a link to the http://www.isitopendata.org/ service, which >> might be used for license inquiries. > Dear Chris, > > As I said, the first line on the CKAN home page says: "CKAN is a > registry of open data and content packages.". Therefore I think there > is a reasonable expectation that the packages registered there are > open. I maintain that CKAN should either change how it explains itself > to make clear that it is a registry of packages that may or may not be > open, or it should remove the packages that are not known to be open. > I'm not taking a position one way or another which they should do > (that's their business), but they should say what they do, and do what > they say. > > Thank you for your pointers to further information on how to find > licenses. I'm fairly familiar with this area given that I work for > Creative Commons. > >> I agree with you that the quality of Linked Data published on the Web is >> crucial, but we also have to take into account that much of the data in the >> LOD cloud is currently still published by research projects in order to >> demonstrate the technologies. >> >> As the Web of Data is evolving and more and more actual owners of the >> datasets start to provide them as Linked Data, I hope that the quality will >> also increase and the datasets will be keep current. Encouraging >> developments into this direction currently happen in the libraries, >> eGovernment, and eCommerce domains. > I agree that these are good examples. I would suggest that you focus > on including the good examples in the LOD cloud, or at a minimum > remove those, like CAS, that fall below the minimal standard of > supplying *some* data and being *open*, so that "linked open data" > means something coherent. > >> On the other hand, the Web is an open system and we will thus always see >> people publishing low-quality, wrong and misleading data. Google handles >> this fact rather successfully using PageRank. As the Web of Data provides >> more structure then the classic Web, I think we might even be able to apply >> more sophisticated data-quality assessment heuristics to decide which data >> we want to use in our applications and which to ignore. Some of these >> methods are listed in [1]. > Look, Chris, I just did a "manual page rank" on the CAS dataset. It is > meaningless. This is a high quality assessment. If the movement can't > act on known good quality information I (and others) will doubt that > automatic algorithms will be credible. > > Moreover, the LOD cloud diagram is an advertisement. There are enough > data sets now that inclusion in the diagram can become a reward for > good work. It's not good advertising for Google when junk sites come > up at the top of search results and they do their best to minimize > this occurrence. The LOD cloud is your front page, and to a certain > extent mine as well as I invest all my time in doing work towards > building the web of data in the Sciences. > > Regards, > Alan > >> Best, >> >> Chris >> >> [1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering >> using the WIQA policy framework. Journal of Web Semantics: Science, Services >> and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages >> 1-10. >> http://dx.doi.org/10.1016/j.websem.2008.02.005 >> >> >> -----Ursprüngliche Nachricht----- >> Von: Alan Ruttenberg [mailto:alanruttenberg@gmail.com] >> Gesendet: Samstag, 4. September 2010 22:20 >> An: Chris Bizer >> Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray >> Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so >> that your dataset is included. >> >> On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer<chris@bizer.de> wrote: >>> So rather than to criticize the work that other people do on collecting >>> meta-information about the datasets in the LOD cloud >> Did you read what I wrote? I made no comment on the adequacy of >> metainformation. In fact I *used* that metainformation to point out >> that the data source in question did not satisfy the "open" provision >> of linked *open* data. In addition I criticized the *inclusion* of the >> data set in the *lod cloud diagram* because of this lack of openness >> and because the actual content of that resource didn't resemble any >> data in the resource that it was derived from (a registry of >> information about chemical compounds), suggesting that it would hurt >> the LOD effort as inclusion would be a kind of "false advertising". >> >> -Alan >> >> > All, See: http://www.ckan.net/group/lod Why: Linking Open Data? It should be: Linked Open Data . Or just: Linked Data (bearing in all the data sets might not be open). -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Sunday, 5 September 2010 16:25:38 UTC