- From: Bernard Vatant <bernard.vatant@mondeca.com>
- Date: Fri, 3 Feb 2012 00:58:35 +0100
- To: Sören Auer <auer@informatik.uni-leipzig.de>
- Cc: Linking Open Data <public-lod@w3.org>
- Message-ID: <CAK4ZFVH9Za0vu_0W_4wLXFuwvGCVPvKzh+Jwd8vUUy-dbdTwRg@mail.gmail.com>
Hello all I've started comparing http://stats.lod2.eu/vocabularies with what we have in store in LOV. A few preliminary stats are available. Those who prefer raw data can go directly to the shared GDocs (waiting for better formats) https://docs.google.com/spreadsheet/ccc?key=0AiYc9tLJbL4SdEhvMlJjSmJELVhqVk9RUzBIWEhBMUE Public access in read-only, if you want edit rights, just ask. Pretty much sandbox/work in progress, provisional but interesting figures nevertheless. Three sheets available : 1. LOV in LOD : vocabularies extracted by LODStats and already present in LOV : 54 so far 2. LOV w/o LOD : vocabularies in LOV not yet used in LOD (at least not extracted by LODStats) : 137 (figures to be consolidated since there are 189 vocs in LOV altogether - duplicates to double-check) 3. LOD w/o LOV : vocabularies extracted by LODStats and not (yet) present in LOV : 150 Figures 1 and 2 show that there is still a large majority of unused vocabularies in LOV.. This is useful information. Does that mean they are useless? Time will tell ... Figure 3 is more challenging. I've looked at each of those 150 URIs and, as of today they can be distributed as following : Less than 50 are proper de-referencable vocabularies, hence "LOV-able". Which means a challenging to-do list for LOV curators, which should lead the figures in 1 and 3 to meet somewhere around 100 with a little effort, but be patient, this is human-checked. If you want some of those to be added in priority, use the suggest facility at http://labs.mondeca.com/dataset/lov/suggest/ More than 60 are either 404, time out or access denied, which does not come as a surprise, but is nevertheless a big issue. It means that data using those vocabularies are relying on semantics no one can check. The rest is de-referencable, but to various types of resources more or less close to one or several vocabularies, but not published following good practices, in a word not in a LOV-able state. All in all, almost half of the vocabularies used in LOD are not meeting a minimal quality requirement : be published at their namespace. Conclusion : Quality, Quality, Quality please ! Double-check the vocabularies you use, publish them properly if they are in your namespace etc etc. Bernard 2012/2/2 Bernard Vatant <bernard.vatant@mondeca.com> > Hello Sören > > Great work! Of course as you can imagine I jumped right away to > http://stats.lod2.eu/vocabularies. > Interesting to see the broad figures (205 vocabularies) vs 189 harvested > as of today at http://labs.mondeca.com/dataset/lov > So I would like to compare, see the overlap ... and complete LOV as needed > :) > > Do you have the vocabularies and datasets using them available in a single > file? (preferably RDF of course!) > > Thanks > > Bernard > > > > 2012/2/2 Sören Auer <auer@informatik.uni-leipzig.de> > >> Dear all, >> >> We are happy to announce the first public *release of LODStats*. >> >> LODStats is a statement-stream-based approach for gathering >> comprehensive statistics about datasets adhering to the Resource >> Description Framework (RDF). LODStats was implemented in Python and >> integrated into the CKAN dataset metadata registry [1]. Thus it helps to >> obtain a comprehensive picture of the current state of the Data Web. >> >> More information about LODStats (including its open-source >> implementation) is available from: >> >> http://aksw.org/projects/LODStats >> >> A demo installation collecting statistics from all LOD datasets >> registered on CKAN is available from: >> >> http://stats.lod2.eu >> >> We would like to thank the AKSW research group [2] and LOD2 project [3] >> members for their suggestions. The development LODStats was supported by >> the FP7 project LOD2 (GA no. 257943). >> >> On behalf of the LODStats team, >> >> Sören Auer, Jan Demter, Michael Martin, Jens Lehmann >> >> [1] http://ckan.net >> [2] http://aksw.org >> [3] http://lod2.eu >> >> > > > -- > *Bernard Vatant > * > Vocabularies & Data Engineering > Tel : + 33 (0)9 71 48 84 59 > Skype : bernard.vatant > Linked Open Vocabularies <http://labs.mondeca.com/dataset/lov> > > -------------------------------------------------------- > *Mondeca** ** * > 3 cité Nollez 75018 Paris, France > www.mondeca.com > Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews> > > -- *Bernard Vatant * Vocabularies & Data Engineering Tel : + 33 (0)9 71 48 84 59 Skype : bernard.vatant Linked Open Vocabularies <http://labs.mondeca.com/dataset/lov> -------------------------------------------------------- *Mondeca** ** * 3 cité Nollez 75018 Paris, France www.mondeca.com Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>
Received on Thursday, 2 February 2012 23:59:37 UTC