W3C home > Mailing lists > Public > public-lod@w3.org > February 2011

Re: Proposal to assess the quality of Linked Data sources

From: Annika Flemming <annika.flemming@gmx.de>
Date: Sat, 26 Feb 2011 22:50:16 +0100
Message-ID: <4D697598.9060105@gmx.de>
To: Bernard Vatant <bernard.vatant@mondeca.com>
CC: public-lod@w3.org
Hi Bernard,

Am 25.02.2011 23:53, schrieb Bernard Vatant:
> Hi Annika
>         - "A vocabulary is said to be established, if it is one of the
>         100 most popular vocabularies stated on pre x.cc" - uhm, as
>         the results from Richard's evaluation have, this is quite arguable
>     It's a practical way to determine it (which I can use for the
>     implementation of the formalism). Another way would be to compare
>     many documents from many data sources and to find out, which
>     vocabularies are most popular.
> I'm particularly interested in this aspect of vocabulary selection. 
> Regarding popularity, I fully go along with Bob regarding prefix.cc in 
> which all sorts of biases can be introduced. I think the popularity is 
> better measured by the use of vocabularies in CKAN datasets, as 
> indicated by "format-*" tags. See http://ckan.net/tag/?page=F and for 
> example http://ckan.net/tag/format-bibo or 
> http://ckan.net/tag/format-foaf.
As I used the CKAN datasets to infer a lot about the quality aspects I 
mentioned, it seems very reasonable to use the format-tags. However, 
they don't seem to be very exhaustive, as for example DBpedia is not 
tagged with format-dbpprop. Therefore, analyzing the cache mentioned by 
Kingsley might be more revealing.** I'll add that to my thesis.
> Another approach I'm currently working on is the one you can find at 
> http://labs.mondeca.com/dataset/lov. The description of interlinked 
> vocabularies (using VOAF vocabulary) provide indication of popularity 
> at the vocabulary level itself. From this dataset (still far from 
> exhaustive of course) you can see which vocabularies are reused, 
> extended, used for annotation by other ones. I think the density of 
> links to and from a vocabulary to other ones gives a good indicator of 
> its "establishment", in combination with the number of datasets 
> actually using it.
Interesting work, I'll try to use this in my thesis!
> Best
> Bernard
> -- 
> Bernard Vatant
> Senior Consultant
> Vocabulary & Data Engineering
> Tel:       +33 (0) 971 488 459
> Mail: bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
> ----------------------------------------------------
> Mondeca
> 3, cité Nollez 75018 Paris France
> Web: http://www.mondeca.com
> Blog: http://mondeca.wordpress.com
> ----------------------------------------------------
Received on Saturday, 26 February 2011 21:50:45 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:11 UTC