Re: Think before you write Semantic Web crawlers from Martin Hepp on 2011-06-22 (semantic-web@w3.org from June 2011)

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Wed, 22 Jun 2011 21:08:44 +0200
To: Andreas Harth <andreas@harth.org>
Cc: Christopher Gutteridge <cjg@ecs.soton.ac.uk>, Daniel Herzig <herzig@kit.edu>, semantic-web@w3.org, public-lod@w3.org
Message-Id: <EC158417-582E-4B0C-A5A0-ECEE56D9F600@ebusiness-unibw.org>

Hi Andreas:

Please make a survey among typical Web site owners on how many of them have

1. access to this level of server configuration and
2. the skills necessary to implement these recommendations.

The WWW was anti-pedantic by design. This was the root of its success. The pedants were the traditional SGML/Hypertext communities. Why are we breeding new pedants?

Martin

On Jun 22, 2011, at 11:44 AM, Andreas Harth wrote:

> Hi Christopher,
> 
> On 06/22/2011 10:14 AM, Christopher Gutteridge wrote:
>> Right now queries to data.southampton.ac.uk (eg.
>> http://data.southampton.ac.uk/products-and-services/CupCake.rdf ) are made live,
>> but this is not efficient. My colleague, Dave Challis, has prepared a SPARQL
>> endpoint which caches results which we can turn on if the load gets too high,
>> which should at least mitigate the problem. Very few datasets change in a 24
>> hours period.
> 
> setting the Expires header and enabling mod_cache in Apache httpd (or adding
> a Squid proxy in front of the HTTP server) works quite well in these cases.
> 
> Best regards,
> Andreas.
>

Received on Wednesday, 22 June 2011 19:09:17 UTC