W3C home > Mailing lists > Public > semantic-web@w3.org > June 2012

Re: [Ann] LODStats - Real-time Data Web Statistics

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Thu, 21 Jun 2012 15:08:34 +0000
To: Sören Auer <auer@informatik.uni-leipzig.de>
CC: Denny Vrandecic <denny.vrandecic@kit.edu>, Linking Open Data <public-lod@w3.org>, SW-forum <semantic-web@w3.org>
Message-ID: <EMEW3|59cb05bba405befe4e0414fa1e218660o5KG8Z02hg|ecs.soton.ac.uk|F15E3DF3-08C1-400C-A551-29FBA5FA1E0D@ecs.soton.ac.uk>
Hi.
On 21 Jun 2012, at 11:40, Sören Auer wrote:

> Am 21.06.2012 12:03, schrieb Hugh Glaser:
>> Interesting question from Denny.
>> I guess you don't do http://thedatahub.org/dataset/sameas-org
>> for the same reason.
>> And
>> http://thedatahub.org/dataset/dbpedia-lite
>> (Or at least I couldn't find them.)
>> 
>> I'm not sure you should claim "all LOD datasets registered on CKAN"
> 
> Depends on the definition of dataset - for me a dataset is something
> available in bulk and not a pointer to a large space of URLs containing
> some data fragments requiring extensive crawling.
I can't agree with this.
To rule out Linked Data that only provides Linked Data without SPARQL or dump and say it is not a "LOD Dataset" seems to be terribly restrictive.
For example, the eprints (eprints.org) Open Archives have upwards of 100M triples of pretty interesting (to some people) Linked Data.
It is mostly not in thedatahub, but even if it was you would ignore it.
In fact, anything that is a wrapper around things like dbpedia, twitter, Facebook, or even Facebook itself is ignored, I am assuming from what you say.
To publish statistics that claims to collect "statistics from all LOD datasets" using a method that ignores such resources is to seriously underreport the LOD activity (not a Good Thing), and also is to publish what I can only say is misleading statistical reports about LOD in general.
I leave aside that you also fail to collect statistics from more than half of the datasets you claim to be collecting.

I realise it may be hard to do anything different, but the badging really is a problem.
If people are to use your numbers then they should be told very clearly how they are derived.
If you want to use your definition of a dataset, then you should make it very clear in the web pages the criteria you are using.

Best
Hugh
> 
> I understand why Linked Open Numbers is not available as a dump - how
> would you package a countable infinite number of resources ;-)
> 
>> if you don't have dbpedialite, for example.
> 
> Does there exist a dump for dbpedialite - a link to the dump does not
> seem to be registered at thedatahub.
> 
> Sören

-- 
Hugh Glaser,  
             Web and Internet Science
             Electronics and Computer Science,
             University of Southampton,
             Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/
Received on Thursday, 21 June 2012 15:09:20 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:49 GMT