Re: Proposal to assess the quality of Linked Data sources

Hi Annika

- "A vocabulary is said to be established, if it is one of the 100 most
>> popular vocabularies stated on pre x.cc" - uhm, as the results from
>> Richard's evaluation have, this is quite arguable
>>
> It's a practical way to determine it (which I can use for the
> implementation of the formalism). Another way would be to compare many
> documents from many data sources and to find out, which vocabularies are
> most popular.


I'm particularly interested in this aspect of vocabulary selection.
Regarding popularity, I fully go along with Bob regarding prefix.cc in which
all sorts of biases can be introduced. I think the popularity is better
measured by the use of vocabularies in CKAN datasets, as indicated by
"format-*" tags. See http://ckan.net/tag/?page=F and for example
http://ckan.net/tag/format-bibo or http://ckan.net/tag/format-foaf.

Another approach I'm currently working on is the one you can find at
http://labs.mondeca.com/dataset/lov. The description of interlinked
vocabularies (using VOAF vocabulary) provide indication of popularity at the
vocabulary level itself. From this dataset (still far from exhaustive of
course) you can see which vocabularies are reused, extended, used for
annotation by other ones. I think the density of links to and from a
vocabulary to other ones gives a good indicator of its "establishment", in
combination with the number of datasets actually using it.

Best

Bernard


-- 
Bernard Vatant
Senior Consultant
Vocabulary & Data Engineering
Tel:       +33 (0) 971 488 459
Mail:     bernard.vatant@mondeca.com
----------------------------------------------------
Mondeca
3, cité Nollez 75018 Paris France
Web:    http://www.mondeca.com
Blog:    http://mondeca.wordpress.com
----------------------------------------------------

Received on Friday, 25 February 2011 22:53:47 UTC