Re: [Ann] LODStats - Real-time Data Web Statistics

Hello all

I've started comparing http://stats.lod2.eu/vocabularies with what we have
in store in LOV.

A few preliminary stats are available. Those who prefer raw data can go
directly to the shared GDocs (waiting for better formats)
https://docs.google.com/spreadsheet/ccc?key=0AiYc9tLJbL4SdEhvMlJjSmJELVhqVk9RUzBIWEhBMUE
Public access in read-only, if you want edit rights, just ask.
Pretty much sandbox/work in progress, provisional but interesting figures
nevertheless. Three sheets available :

1. LOV in LOD : vocabularies extracted by LODStats and already present in
LOV : 54 so far
2. LOV w/o LOD : vocabularies in LOV not yet used in LOD (at least not
extracted by LODStats) : 137
(figures to be consolidated since there are 189 vocs in LOV altogether -
duplicates to double-check)
3. LOD w/o LOV : vocabularies extracted by LODStats and not (yet) present
in LOV : 150

Figures 1 and 2 show that there is still a large majority of unused
vocabularies in LOV.. This is useful information. Does that mean they are
useless? Time will tell ...

Figure 3 is more challenging. I've looked at each of those 150 URIs and, as
of today they can be distributed as following :

Less than 50 are proper de-referencable vocabularies, hence "LOV-able".
Which means a challenging to-do list for LOV curators, which should lead
the figures in 1 and 3 to meet somewhere around 100 with a little effort,
but be patient, this is human-checked. If you want some of those to be
added in priority, use the suggest facility at
http://labs.mondeca.com/dataset/lov/suggest/

More than 60 are either 404, time out or access denied, which does not come
as a surprise, but is nevertheless a big issue. It means that data using
those vocabularies are relying on semantics no one can check.

The rest is de-referencable, but to various types of resources more or less
close to one or several vocabularies, but not published following good
practices, in a word not in a LOV-able state.

All in all, almost half of the vocabularies used in LOD are not meeting a
minimal quality requirement : be published at their namespace.

Conclusion : Quality, Quality, Quality please !
Double-check the vocabularies you use, publish them properly if they are in
your namespace etc etc.

Bernard


2012/2/2 Bernard Vatant <bernard.vatant@mondeca.com>

> Hello Sören
>
> Great work! Of course as you can imagine I jumped right away to
> http://stats.lod2.eu/vocabularies.
> Interesting to see the broad figures (205 vocabularies) vs 189 harvested
> as of today at http://labs.mondeca.com/dataset/lov
> So I would like to compare, see the overlap ... and complete LOV as needed
> :)
>
> Do you have the vocabularies and datasets using them available in a single
> file? (preferably RDF of course!)
>
> Thanks
>
> Bernard
>
>
>
> 2012/2/2 Sören Auer <auer@informatik.uni-leipzig.de>
>
>> Dear all,
>>
>> We are happy to announce the first public *release of LODStats*.
>>
>> LODStats is a statement-stream-based approach for gathering
>> comprehensive statistics about datasets adhering to the Resource
>> Description Framework (RDF). LODStats was implemented in Python and
>> integrated into the CKAN dataset metadata registry [1]. Thus it helps to
>> obtain a comprehensive picture of the current state of the Data Web.
>>
>> More information about LODStats (including its open-source
>> implementation) is available from:
>>
>> http://aksw.org/projects/LODStats
>>
>> A demo installation collecting statistics from all LOD datasets
>> registered on CKAN is available from:
>>
>> http://stats.lod2.eu
>>
>> We would like to thank the AKSW research group [2] and LOD2 project [3]
>> members for their suggestions. The development LODStats was supported by
>> the FP7 project LOD2 (GA no. 257943).
>>
>> On behalf of the LODStats team,
>>
>> Sören Auer, Jan Demter, Michael Martin, Jens Lehmann
>>
>> [1] http://ckan.net
>> [2] http://aksw.org
>> [3] http://lod2.eu
>>
>>
>
>
> --
> *Bernard Vatant
> *
> Vocabularies & Data Engineering
> Tel :  + 33 (0)9 71 48 84 59
>  Skype : bernard.vatant
> Linked Open Vocabularies <http://labs.mondeca.com/dataset/lov>
>
> --------------------------------------------------------
> *Mondeca**          **                   *
> 3 cité Nollez 75018 Paris, France
> www.mondeca.com
> Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>
>
>


-- 
*Bernard Vatant
*
Vocabularies & Data Engineering
Tel :  + 33 (0)9 71 48 84 59
Skype : bernard.vatant
Linked Open Vocabularies <http://labs.mondeca.com/dataset/lov>

--------------------------------------------------------
*Mondeca**          **                   *
3 cité Nollez 75018 Paris, France
www.mondeca.com
Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>

Received on Thursday, 2 February 2012 23:59:37 UTC