- From: Olaf Hartig <ohartig@uwaterloo.ca>
- Date: Mon, 8 Sep 2014 09:13:18 -0400
- To: <public-lod@w3.org>
- CC: Christian Bizer <chris@bizer.de>
Chris, thanks for the explanation! Olaf On Monday 08 September 2014 10:03:20 Christian Bizer wrote: > Hi Olaf, > > you asked about the number of Linked Datasets on the Web reported in our > paper [1] and for the new LOD cloud diagram [2]. > As the numbers might also confuse other people, I did put the LOD mailing > list into the cc. > > We seeded our crawl with a large number of URIs from the BTC2012 crawl, the > datahub.io catalog plus some URIs from datasets mentioned on the LOD list. > > Our crawler did retrieve RDF data from 1014 data sources [3]. It was blocked > by 77 Linked Data sources via robots.txt. These two numbers together result > in the 1091 Linked Datasets that we report as overall number in our paper. > > Unfortunately, only 397 of the crawled datasets were linked to each other > via RDF links (that our crawler discovered) and we thus included only these > datasets [4] into the "Crawlable LOD Cloud 2014" [5]. > > Please note that this does not mean that there are no other crawlable Linked > Datasets, as we did not do an extensive crawl and our crawler might thus > have missed some datasets. As our crawler only gather a data sample from > each source, it might also have missed some RDF links between datasets. > > We thus asked via the mailing list to point us at additional datasets that > we have missed so far and to enter meta-information about these datasets > into the datahub.io catalog. This call resulted in quite some feedback and > we did draw the LOD cloud 2014 [2] taking this feedback into account. The > 570 datasets contained in the new version thus include > > 1. datasets that we did crawl > 2. datasets that our crawler discovered but did not crawl due to robots.txts > 3. additional datasets that resulted from our call for feedback. > 4. additional datasets that became linked by adding the datasets from bullet > 2. and 3. > > As with the previous versions of the cloud, we only included datasets that > are connected to other datasets in the cloud. > > After finishing the diagram, we checked for how many of the 570 datasets, > the datahub.io catalog contains meta-information and it turned out that 374 > datasets are described in the catalog. 196 datasets were not described in > the catalog yet. For these datasets, we added the meta-information that we > extracted from the crawled data to the catalog using the lodcloud2014 > organization [5] in order to keep human- and machine-generated data separate > [5]. > > Parallel to our efforts, Tobias Käfer and Andreas Harth from KIT have > conducted a much larger crawl of the Linked Data web and now offer the > resulting dataset for download [6]. They are currently analyzing their data > and it will be interesting to see to which extent their results verify our > findings and how many additional datasets their crawler did discover. > > Cheers, > > Chris > > > [1] http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/ > [2] http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ > [3] > http://linkeddatacatalog.dws.informatik.uni-mannheim.de/dataset?tags=LinkedD > ataCrawl2014 > [3] > http://linkeddatacatalog.dws.informatik.uni-mannheim.de/dataset?tags=crawled > LinkedDataCloud2014 > [4] > http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/LODCloudDiagra > m.html > [5] http://datahub.io/organization/lodcloud2014 > [6] http://km.aifb.kit.edu/projects/btc-2014/ > > > > > > -----Ursprüngliche Nachricht----- > Von: Olaf Hartig [mailto:ohartig@uwaterloo.ca] > Gesendet: Samstag, 6. September 2014 15:43 > An: max@informatik.uni-mannheim.de; chris@informatik.uni-mannheim.de; > heiko@informatik.uni-mannheim.de > Betreff: # of datasets in LOD cloud diagram > > Hi Max, Chris, Heiko, > > Laut Eurer Webseite zum aktuellen LOD-Cloud Diagramm > (http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/) zeigt das > Diagramm > 570 Datasets. Jedoch, die Vorgänger-Version, welche Ihr im Rahmen Eures > ISWC'14 Papers angefertigt habt, enthält (laut Eures Papers, Table 1) 1091 > Datasets. Aber auf der oben genannten Webseite ist plötzlich nur noch die > Rede von 196 Datasets, welche Euer Crawl entdeckt hat. Wie sind diese Zahlen > zu verstehen? Warum sind nicht alle 1091 Datasets im aktuellen Diagramm > enthalten? > > Viele Grüße, > Olaf
Received on Monday, 8 September 2014 13:13:53 UTC