Re: Think before you write Semantic Web crawlers from Yves Raimond on 2011-06-22 (semantic-web@w3.org from June 2011)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Wed, 22 Jun 2011 20:38:38 +0100
To: Andreas Harth <andreas@harth.org>
Cc: Martin Hepp <martin.hepp@ebusiness-unibw.org>, Christopher Gutteridge <cjg@ecs.soton.ac.uk>, Daniel Herzig <herzig@kit.edu>, semantic-web@w3.org, public-lod@w3.org
Message-ID: <BANLkTim5ezD1r2wSK=dOiNJt1cRk6hQHaQ@mail.gmail.com>

On Wed, Jun 22, 2011 at 8:29 PM, Andreas Harth <andreas@harth.org> wrote:
> Hi Martin,
>
> On 06/22/2011 09:08 PM, Martin Hepp wrote:
>>
>> Please make a survey among typical Web site owners on how many of them
>> have
>>
>> 1. access to this level of server configuration and
>
>> 2. the skills necessary to implement these recommendations.
>
> d'accord .
>
> But the case we're discussing there's also:
>
> 3. publishes millions of pages
>
> I am glad you brought up the issue, as there are several data providers
> out there (some with quite prominent names) with hundreds of millions of
> triples, but unable to sustain lookups every couple of seconds or so.

Very funny :-) At peak times, a single crawler was hitting us with 150
rq/s. Quite far from "every couple of seconds or so".

Best,
y

>
> I am very much in favour of amateur web enthusiasts (I would like to claim
> I've started as one).  Unfortunately, you get them on both ends, publishers
> and consumers.  Postel's law applies to both, I guess.
>
> Best regards,
> Andreas.
>
>

Received on Wednesday, 22 June 2011 19:39:15 UTC