Re: foaf-search.net with enhanced functionality from William Waites on 2010-12-24 (semantic-web@w3.org from December 2010)

From: William Waites <ww@styx.org>
Date: Fri, 24 Dec 2010 11:34:48 +0100
To: Michael Brunnbauer <brunni@netestate.de>
Cc: semantic-web@w3.org
Message-ID: <20101224103448.GJ1281@styx.org>

* [2010-12-23 17:51:41 +0100] Michael Brunnbauer <brunni@netestate.de> écrit:

] On Thu, Dec 23, 2010 at 05:40:43PM +0100, William Waites wrote:
] > Hi Michael, this is good news. But i have a question: is it possible
] > to point you robot at a dump to prevent it mercilessly crawling large
] > datasets like bnb.bibliographica.org? If so, how?
] 
] As we use named graphs for provenance tracking, I see no way to make use of
] a dump. Our crawler waits at least 10 secs between two requests to the same
] site. Of course I can block crawling of bnb.bibliographica.org if you want.
] How many RDFs and pages with RDFa does it have ?

The HTML+RDFa pages are just a (slightly abbreviated) version of the
corresponding graph, made with fresnel. For rdf-consuming robots it
really is better to look at the native version (via
content-negotiation or requesting ${uri}.rdf). In this case there are
about 3 million distinct graphs and if you crawl blindly you'll also
get another several million cbds for authors and publishers. At that
rate it may take several years for the crawl to finish...

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Received on Friday, 24 December 2010 10:35:18 UTC