- From: Annika Flemming <annika.flemming@gmx.de>
- Date: Sat, 26 Feb 2011 22:50:16 +0100
- To: Bernard Vatant <bernard.vatant@mondeca.com>
- CC: public-lod@w3.org
- Message-ID: <4D697598.9060105@gmx.de>
Hi Bernard, Am 25.02.2011 23:53, schrieb Bernard Vatant: > Hi Annika > > - "A vocabulary is said to be established, if it is one of the > 100 most popular vocabularies stated on pre x.cc" - uhm, as > the results from Richard's evaluation have, this is quite arguable > > It's a practical way to determine it (which I can use for the > implementation of the formalism). Another way would be to compare > many documents from many data sources and to find out, which > vocabularies are most popular. > > > I'm particularly interested in this aspect of vocabulary selection. > Regarding popularity, I fully go along with Bob regarding prefix.cc in > which all sorts of biases can be introduced. I think the popularity is > better measured by the use of vocabularies in CKAN datasets, as > indicated by "format-*" tags. See http://ckan.net/tag/?page=F and for > example http://ckan.net/tag/format-bibo or > http://ckan.net/tag/format-foaf. As I used the CKAN datasets to infer a lot about the quality aspects I mentioned, it seems very reasonable to use the format-tags. However, they don't seem to be very exhaustive, as for example DBpedia is not tagged with format-dbpprop. Therefore, analyzing the cache mentioned by Kingsley might be more revealing.** I'll add that to my thesis. > > Another approach I'm currently working on is the one you can find at > http://labs.mondeca.com/dataset/lov. The description of interlinked > vocabularies (using VOAF vocabulary) provide indication of popularity > at the vocabulary level itself. From this dataset (still far from > exhaustive of course) you can see which vocabularies are reused, > extended, used for annotation by other ones. I think the density of > links to and from a vocabulary to other ones gives a good indicator of > its "establishment", in combination with the number of datasets > actually using it. Interesting work, I'll try to use this in my thesis! > > Best > > Bernard > > > -- > Bernard Vatant > Senior Consultant > Vocabulary & Data Engineering > Tel: +33 (0) 971 488 459 > Mail: bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com> > ---------------------------------------------------- > Mondeca > 3, cité Nollez 75018 Paris France > Web: http://www.mondeca.com > Blog: http://mondeca.wordpress.com > ----------------------------------------------------
Received on Saturday, 26 February 2011 21:50:45 UTC