Re: Proposal to assess the quality of Linked Data sources from Kingsley Idehen on 2011-02-26 (public-lod@w3.org from February 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 25 Feb 2011 21:22:06 -0500
To: Hugh Glaser <hg@ecs.soton.ac.uk>
CC: Bernard Vatant <bernard.vatant@mondeca.com>, Annika Flemming <annika.flemming@gmx.de>, "<public-lod@w3.org>" <public-lod@w3.org>, Bob Ferris <zazi@elbklang.net>
Message-ID: <4D6863CE.2030909@openlinksw.com>

> On 25 Feb 2011, at 23:00, Kingsley Idehen wrote:
>
>>> Hi Annika
>>>
>>> - "A vocabulary is said to be established, if it is one of the 100 most popular vocabularies stated on pre x.cc" - uhm, as the results from Richard's evaluation have, this is quite arguable
>>> It's a practical way to determine it (which I can use for the implementation of the formalism). Another way would be to compare many documents from many data sources and to find out, which vocabularies are most popular.
>>>
>>> I'm particularly interested in this aspect of vocabulary selection. Regarding popularity, I fully go along with Bob regarding prefix.cc in which all sorts of biases can be introduced. I think the popularity is better measured by the use of vocabularies in CKAN datasets, as indicated by "format-*" tags. See http://ckan.net/tag/?page=F and for example http://ckan.net/tag/format-bibo or http://ckan.net/tag/format-foaf.
>> Why not actual link coefficient from an LOD Cloud cache instance ? That a least shows what's being used :-)
> There is no LOD Cloud cache instance as far as I can tell.

Okay, you might not see it as a LOD Cloud cache. How about a massive 13B 
strong live instance [1] with as much Linked Data as we can get our 
hands on? There good sampling there since you can use Entity Ranking to 
analyze usage.
> So any attempt to infer data from something that claimed to be would be misleading.

No in my eyes, but we can agree to disagree as we've done in the past 
re. this matter :-)

Links:

1. http://lod.openlinksw.com


Kingsley

> Cheers
> Hugh
>> Kingsley
>>> Another approach I'm currently working on is the one you can find at http://labs.mondeca.com/dataset/lov. The description of interlinked vocabularies (using VOAF vocabulary) provide indication of popularity at the vocabulary level itself. From this dataset (still far from exhaustive of         course) you can see which vocabularies are reused, extended, used for annotation by other ones. I think the density of links to and from a vocabulary to other ones gives a good indicator of its "establishment", in combination with the number of datasets actually using it.
>>>
>>> Best
>>>
>>> Bernard
>>>
>>>
>>> -- 
>>> Bernard Vatant
>>> Senior Consultant
>>> Vocabulary&  Data Engineering
>>> Tel:       +33 (0) 971 488 459
>>> Mail:     bernard.vatant@mondeca.com
>>> ----------------------------------------------------
>>> Mondeca
>>> 3, cité Nollez 75018 Paris France
>>> Web:    http://www.mondeca.com
>>> Blog:    http://mondeca.wordpress.com
>>> ----------------------------------------------------
>>
>> -- 
>>
>> Regards,
>>
>> Kingsley Idehen	
>> President&  CEO
>> OpenLink Software
>> Web:
>> http://www.openlinksw.com
>>
>> Weblog:
>> http://www.openlinksw.com/blog/~kidehen
>>
>> Twitter/Identi.ca: kidehen
>>
>>
>>
>>
>>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Saturday, 26 February 2011 02:22:39 UTC