Re: Updated LOD Cloud Diagram - Missed data sources.

Hello Chris,

On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote:
> Sorry, we are not Google and simply did not have the resources to crawl the whole Web and as for RDF/XML when dereferencing each URL.

See http://www.sengine.info/

We try to crawl 1000 URLs from every site that has less than 5000 other sites
on the same IP.

> Alternatively, one could of course search for HTML documents that contain links pointing at RDF/Linked Data documents (for instance using <link rel="alternate" type="application/rdf+xml" ...> in the header part of an HTML document).
[...]
> It would be great if somebody would investigate this deeper and produce a list with Linked Data URIs that could be used as seeds for further crawls.

mysql> select rel,count(*) as number from link_rdf group by rel order by number desc limit 10;
+--------------------+---------+
| rel                | number  |
+--------------------+---------+
| meta               | 4348067 |
| alternate          | 2080611 |
| alternate meta     |   61169 |
| meta FOAF.MAKER    |   38293 |
| ExportRDF          |   31366 |
| alternate nofollow |   19176 |
| media              |   11568 |
| Skos metadata      |    9364 |
| resourcemap        |    8727 |
|                    |    1708 |
+--------------------+---------+
10 rows in set (28.36 sec)

select title,count(*) as number from link_rdf group by title order by number desc limit 50;
+-------------------------------------------------+---------+
| title                                           | number  |
+-------------------------------------------------+---------+
| Creative Commons                                | 2074663 |
| FOAF                                            | 1805441 |
| RDF+XML                                         |  328872 |
| RSS 1.0                                         |  155568 |
| ICRA labels                                     |  152142 |
| RDF                                             |  151377 |
|                                                 |  144916 |
| Calais RDF                                      |   65550 |
| SIOC                                            |   50826 |
| RDF 1.0                                         |   48448 |
| Dublin Core                                     |   48047 |
| RDF 1.1                                         |   40227 |
| Meta Information                                |   25181 |
| RDF Version                                     |   24449 |
| RDF Version of this post                        |   24356 |
| Items in Collection                             |   23903 |
| RDF Representation                              |   22327 |
| This category listings in RDF                   |   19176 |
| Dublin                                          |   17796 |
| Structured Descriptor Document (RDF/XML format) |   14648 |
| RDF Metadata                                    |   11275 |
| Skos Core                                       |    9364 |
| Structured Description in RDF/XML format        |    8666 |
| Items in Community                              |    7358 |
| notice                                          |    5998 |
| RDF/XML version of this document                |    5677 |
| RDF/XML                                         |    5325 |
| RDF/XML Version                                 |    4531 |
| Get RDF 1.0 Feed                                |    4426 |
| RDF/XML data for this webshop                   |    4063 |
| RDF+XML (VOA3R)                                 |    3887 |
| LG RDF                                          |    3547 |
| Metadata                                        |    3047 |
| Packages involving this user                    |    2914 |
| Product RDF/XML data                            |    2905 |
| DOAP                                            |    2640 |
| Geo                                             |    2593 |
| XML                                             |    2409 |
| Main Page                                       |    1735 |
| This page in RDF (XML)                          |    1573 |
| Public Stream Feed (RSS 1.0)                    |    1541 |
| RDF Description                                 |    1506 |
| Get RDF                                         |    1161 |
| RDF Version of this categorie                   |    1124 |
| unprocessed RDF+XML metadata                    |     993 |
| Dane produktu w formacie RDF/XML                |     990 |
| Supplier RDF/XML data                           |     962 |
| Essay metadata                                  |     758 |
| Dublin Core Metadata                            |     737 |
| rdf:foaf                                        |     730 |
+-------------------------------------------------+---------+
50 rows in set (2 min 22.54 sec)

Contact me if you are interested.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Received on Monday, 18 August 2014 11:35:45 UTC