- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 18 Aug 2014 12:57:35 +0100
- To: public-lod@w3.org
- Message-ID: <53F1EA2F.1060801@openlinksw.com>
On 8/18/14 12:35 PM, Michael Brunnbauer wrote: > Hello Chris, > > On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote: >> >Sorry, we are not Google and simply did not have the resources to crawl the whole Web and as for RDF/XML when dereferencing each URL. > Seehttp://www.sengine.info/ > > We try to crawl 1000 URLs from every site that has less than 5000 other sites > on the same IP. > >> >Alternatively, one could of course search for HTML documents that contain links pointing at RDF/Linked Data documents (for instance using <link rel="alternate" type="application/rdf+xml" ...> in the header part of an HTML document). > [...] >> >It would be great if somebody would investigate this deeper and produce a list with Linked Data URIs that could be used as seeds for further crawls. > mysql> select rel,count(*) as number from link_rdf group by rel order by number desc limit 10; > +--------------------+---------+ > | rel | number | > +--------------------+---------+ > | meta | 4348067 | > | alternate | 2080611 | > | alternate meta | 61169 | > | meta FOAF.MAKER | 38293 | > | ExportRDF | 31366 | > | alternate nofollow | 19176 | > | media | 11568 | > | Skos metadata | 9364 | > | resourcemap | 8727 | > | | 1708 | > +--------------------+---------+ > 10 rows in set (28.36 sec) > > select title,count(*) as number from link_rdf group by title order by number desc limit 50; > +-------------------------------------------------+---------+ > | title | number | > +-------------------------------------------------+---------+ > | Creative Commons | 2074663 | > | FOAF | 1805441 | > | RDF+XML | 328872 | > | RSS 1.0 | 155568 | > | ICRA labels | 152142 | > | RDF | 151377 | > | | 144916 | > | Calais RDF | 65550 | > | SIOC | 50826 | > | RDF 1.0 | 48448 | > | Dublin Core | 48047 | > | RDF 1.1 | 40227 | > | Meta Information | 25181 | > | RDF Version | 24449 | > | RDF Version of this post | 24356 | > | Items in Collection | 23903 | > | RDF Representation | 22327 | > | This category listings in RDF | 19176 | > | Dublin | 17796 | > | Structured Descriptor Document (RDF/XML format) | 14648 | > | RDF Metadata | 11275 | > | Skos Core | 9364 | > | Structured Description in RDF/XML format | 8666 | > | Items in Community | 7358 | > | notice | 5998 | > | RDF/XML version of this document | 5677 | > | RDF/XML | 5325 | > | RDF/XML Version | 4531 | > | Get RDF 1.0 Feed | 4426 | > | RDF/XML data for this webshop | 4063 | > | RDF+XML (VOA3R) | 3887 | > | LG RDF | 3547 | > | Metadata | 3047 | > | Packages involving this user | 2914 | > | Product RDF/XML data | 2905 | > | DOAP | 2640 | > | Geo | 2593 | > | XML | 2409 | > | Main Page | 1735 | > | This page in RDF (XML) | 1573 | > | Public Stream Feed (RSS 1.0) | 1541 | > | RDF Description | 1506 | > | Get RDF | 1161 | > | RDF Version of this categorie | 1124 | > | unprocessed RDF+XML metadata | 993 | > | Dane produktu w formacie RDF/XML | 990 | > | Supplier RDF/XML data | 962 | > | Essay metadata | 758 | > | Dublin Core Metadata | 737 | > | rdf:foaf | 730 | > +-------------------------------------------------+---------+ > 50 rows in set (2 min 22.54 sec) > > Contact me if you are interested. Do you not have this data in RDF form? Ideally, you should publish this data in a form that's accessible via HTTP lookups (and SPARQL queries. I am sure you can see the irony in the SQL query results presented above :-) -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 18 August 2014 11:57:52 UTC