- From: Michael Brunnbauer <brunni@netestate.de>
- Date: Mon, 18 Aug 2014 13:35:21 +0200
- To: Christian Bizer <chris@bizer.de>
- Cc: 'Linking Open Data' <public-lod@w3.org>
- Message-ID: <20140818113521.GA10852@netestate.de>
Hello Chris, On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote: > Sorry, we are not Google and simply did not have the resources to crawl the whole Web and as for RDF/XML when dereferencing each URL. See http://www.sengine.info/ We try to crawl 1000 URLs from every site that has less than 5000 other sites on the same IP. > Alternatively, one could of course search for HTML documents that contain links pointing at RDF/Linked Data documents (for instance using <link rel="alternate" type="application/rdf+xml" ...> in the header part of an HTML document). [...] > It would be great if somebody would investigate this deeper and produce a list with Linked Data URIs that could be used as seeds for further crawls. mysql> select rel,count(*) as number from link_rdf group by rel order by number desc limit 10; +--------------------+---------+ | rel | number | +--------------------+---------+ | meta | 4348067 | | alternate | 2080611 | | alternate meta | 61169 | | meta FOAF.MAKER | 38293 | | ExportRDF | 31366 | | alternate nofollow | 19176 | | media | 11568 | | Skos metadata | 9364 | | resourcemap | 8727 | | | 1708 | +--------------------+---------+ 10 rows in set (28.36 sec) select title,count(*) as number from link_rdf group by title order by number desc limit 50; +-------------------------------------------------+---------+ | title | number | +-------------------------------------------------+---------+ | Creative Commons | 2074663 | | FOAF | 1805441 | | RDF+XML | 328872 | | RSS 1.0 | 155568 | | ICRA labels | 152142 | | RDF | 151377 | | | 144916 | | Calais RDF | 65550 | | SIOC | 50826 | | RDF 1.0 | 48448 | | Dublin Core | 48047 | | RDF 1.1 | 40227 | | Meta Information | 25181 | | RDF Version | 24449 | | RDF Version of this post | 24356 | | Items in Collection | 23903 | | RDF Representation | 22327 | | This category listings in RDF | 19176 | | Dublin | 17796 | | Structured Descriptor Document (RDF/XML format) | 14648 | | RDF Metadata | 11275 | | Skos Core | 9364 | | Structured Description in RDF/XML format | 8666 | | Items in Community | 7358 | | notice | 5998 | | RDF/XML version of this document | 5677 | | RDF/XML | 5325 | | RDF/XML Version | 4531 | | Get RDF 1.0 Feed | 4426 | | RDF/XML data for this webshop | 4063 | | RDF+XML (VOA3R) | 3887 | | LG RDF | 3547 | | Metadata | 3047 | | Packages involving this user | 2914 | | Product RDF/XML data | 2905 | | DOAP | 2640 | | Geo | 2593 | | XML | 2409 | | Main Page | 1735 | | This page in RDF (XML) | 1573 | | Public Stream Feed (RSS 1.0) | 1541 | | RDF Description | 1506 | | Get RDF | 1161 | | RDF Version of this categorie | 1124 | | unprocessed RDF+XML metadata | 993 | | Dane produktu w formacie RDF/XML | 990 | | Supplier RDF/XML data | 962 | | Essay metadata | 758 | | Dublin Core Metadata | 737 | | rdf:foaf | 730 | +-------------------------------------------------+---------+ 50 rows in set (2 min 22.54 sec) Contact me if you are interested. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail brunni@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Received on Monday, 18 August 2014 11:35:45 UTC