- From: Sören Auer <auer@informatik.uni-leipzig.de>
- Date: Fri, 22 Jun 2012 20:31:26 +0200
- To: Denny Vrandecic <denny.vrandecic@wikimedia.de>
- CC: Hugh Glaser <hg@ecs.soton.ac.uk>, Linking Open Data <public-lod@w3.org>, SW-forum <semantic-web@w3.org>
Am 22.06.2012 11:30, schrieb Denny Vrandecic: > According to your definition, then LODStats is misnamed. > It should be LOD Datasets Stats. > > Or am I misunderstanding something? Maybe you are right Denny, but there is never a perfect name. Actually LODStats is both, a tool and a service. The open-source tool (https://github.com/AKSW/LODStats) can be used for analysing anything. If you are not happy with our selection criteria in the service, you can run your own LODStats installation, put a crawler in front and analyse all the datasets you want. Just our service at stats.lod2.eu is a little selective ;-) Best, Sören > On 22 Jun 2012, at 01:30, Sören Auer wrote: > >> Am 21.06.2012 17:08, schrieb Hugh Glaser: >>> Hi. >>> On 21 Jun 2012, at 11:40, Sören Auer wrote: >>> >>>> Am 21.06.2012 12:03, schrieb Hugh Glaser: >>>>> Interesting question from Denny. >>>>> I guess you don't do http://thedatahub.org/dataset/sameas-org >>>>> for the same reason. >>>>> And >>>>> http://thedatahub.org/dataset/dbpedia-lite >>>>> (Or at least I couldn't find them.) >>>>> >>>>> I'm not sure you should claim "all LOD datasets registered on CKAN" >>>> >>>> Depends on the definition of dataset - for me a dataset is something >>>> available in bulk and not a pointer to a large space of URLs containing >>>> some data fragments requiring extensive crawling. >>> I can't agree with this. >>> To rule out Linked Data that only provides Linked Data without SPARQL or dump and say it is not a "LOD Dataset" seems to be terribly restrictive. >> >> I would distinguish between Linked Data and a LOD dataset: >> >> For me (and I would assume most people) /dataset/ means a set of data, >> i.e. a downloadable dump or bulk data access (e.g. via SPARQL) to a data >> repository. >> >> When the data adheres to the RDF data model and dereferenceable IRIs are >> used its a /Linked Data dataset/. >> >> When licensed under an open license (according to the open definition) >> its a /Linked Open Data (LOD) dataset/. >> >> I agree, that /Linked Data/ also comprises individual data resources >> (either independently) or integrated into HTML as RDFa, but I would not >> call these dataset then and also not open (if not licensed according to >> the open definition). BTW: The open definition also requires bulk data >> access! So we have already to reasons, why the concept "LOD dataset" >> should imply availability of bulk data. This is also, what we mention >> everywhere when describing LODStats. >> >> When you are interested in statistics about arbitrary Linked Data >> Sindice provides probably the better statistics. >> >>> For example, the eprints (eprints.org) Open Archives have upwards of 100M triples of pretty interesting (to some people) Linked Data. >> >> Maybe interesting, but if I have to crawl it in order to make use of it >> the burden is way too high for most users. >> >>> It is mostly not in thedatahub, but even if it was you would ignore it. >>> In fact, anything that is a wrapper around things like dbpedia, twitter, Facebook, or even Facebook itself is ignored, I am assuming from what you say. >> >> For DBpedia you don't need a wrapper - the whole dataset is available in >> bulk. All others are from my point of view neither datasets nor open. >> Maybe you can call them data services, where you can obtain an >> individual data item at a time. And why would you want to call a wrapper >> dataset. Fundamental requirements for datasets would be from my point of >> view that you can apply set operations like merging, joining etc. You >> can not do that with wrappers, so why should we call them datasets? >> >>> To publish statistics that claims to collect "statistics from all LOD datasets" using a method that ignores such resources is to seriously underreport the LOD activity (not a Good Thing), and also is to publish what I can only say is misleading statistical reports about LOD in general. >>> I leave aside that you also fail to collect statistics from more than half of the datasets you claim to be collecting. >> >> I agree, that our figures are quite pessimistic, but in a way, they >> reflect, what people really see -- if there is no link to the dump in >> thedatahub the dataset is difficult to find obviously, if >> confusing/non-standard file extensions or dataset package formats are >> used this makes it also very difficult for people to actually use this >> data. So I think its better, to be a little more pessimistic in this >> case instead of reporting skyrocking numbers all the time. >> >> Sören >> > >
Received on Friday, 22 June 2012 18:31:46 UTC