Re: Updated LOD Cloud Diagram - Missed data sources.

On 8/18/14 12:35 PM, Michael Brunnbauer wrote:
> Hello Chris,
>
> On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote:
>> >Sorry, we are not Google and simply did not have the resources to crawl the whole Web and as for RDF/XML when dereferencing each URL.
> Seehttp://www.sengine.info/
>
> We try to crawl 1000 URLs from every site that has less than 5000 other sites
> on the same IP.
>
>> >Alternatively, one could of course search for HTML documents that contain links pointing at RDF/Linked Data documents (for instance using <link rel="alternate" type="application/rdf+xml" ...> in the header part of an HTML document).
> [...]
>> >It would be great if somebody would investigate this deeper and produce a list with Linked Data URIs that could be used as seeds for further crawls.
> mysql> select rel,count(*) as number from link_rdf group by rel order by number desc limit 10;
> +--------------------+---------+
> | rel                | number  |
> +--------------------+---------+
> | meta               | 4348067 |
> | alternate          | 2080611 |
> | alternate meta     |   61169 |
> | meta FOAF.MAKER    |   38293 |
> | ExportRDF          |   31366 |
> | alternate nofollow |   19176 |
> | media              |   11568 |
> | Skos metadata      |    9364 |
> | resourcemap        |    8727 |
> |                    |    1708 |
> +--------------------+---------+
> 10 rows in set (28.36 sec)
>
> select title,count(*) as number from link_rdf group by title order by number desc limit 50;
> +-------------------------------------------------+---------+
> | title                                           | number  |
> +-------------------------------------------------+---------+
> | Creative Commons                                | 2074663 |
> | FOAF                                            | 1805441 |
> | RDF+XML                                         |  328872 |
> | RSS 1.0                                         |  155568 |
> | ICRA labels                                     |  152142 |
> | RDF                                             |  151377 |
> |                                                 |  144916 |
> | Calais RDF                                      |   65550 |
> | SIOC                                            |   50826 |
> | RDF 1.0                                         |   48448 |
> | Dublin Core                                     |   48047 |
> | RDF 1.1                                         |   40227 |
> | Meta Information                                |   25181 |
> | RDF Version                                     |   24449 |
> | RDF Version of this post                        |   24356 |
> | Items in Collection                             |   23903 |
> | RDF Representation                              |   22327 |
> | This category listings in RDF                   |   19176 |
> | Dublin                                          |   17796 |
> | Structured Descriptor Document (RDF/XML format) |   14648 |
> | RDF Metadata                                    |   11275 |
> | Skos Core                                       |    9364 |
> | Structured Description in RDF/XML format        |    8666 |
> | Items in Community                              |    7358 |
> | notice                                          |    5998 |
> | RDF/XML version of this document                |    5677 |
> | RDF/XML                                         |    5325 |
> | RDF/XML Version                                 |    4531 |
> | Get RDF 1.0 Feed                                |    4426 |
> | RDF/XML data for this webshop                   |    4063 |
> | RDF+XML (VOA3R)                                 |    3887 |
> | LG RDF                                          |    3547 |
> | Metadata                                        |    3047 |
> | Packages involving this user                    |    2914 |
> | Product RDF/XML data                            |    2905 |
> | DOAP                                            |    2640 |
> | Geo                                             |    2593 |
> | XML                                             |    2409 |
> | Main Page                                       |    1735 |
> | This page in RDF (XML)                          |    1573 |
> | Public Stream Feed (RSS 1.0)                    |    1541 |
> | RDF Description                                 |    1506 |
> | Get RDF                                         |    1161 |
> | RDF Version of this categorie                   |    1124 |
> | unprocessed RDF+XML metadata                    |     993 |
> | Dane produktu w formacie RDF/XML                |     990 |
> | Supplier RDF/XML data                           |     962 |
> | Essay metadata                                  |     758 |
> | Dublin Core Metadata                            |     737 |
> | rdf:foaf                                        |     730 |
> +-------------------------------------------------+---------+
> 50 rows in set (2 min 22.54 sec)
>
> Contact me if you are interested.

Do you not have this data in RDF form? Ideally, you should publish this 
data in a form that's accessible via HTTP lookups (and SPARQL queries. I 
am sure you can see the irony in the SQL query results presented above :-)

-- 
Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Received on Monday, 18 August 2014 11:57:52 UTC