W3C home > Mailing lists > Public > public-lod@w3.org > June 2011

Re: Think before you write Semantic Web crawlers

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 23 Jun 2011 11:32:43 +0100
Message-ID: <4E03164B.5030307@openlinksw.com>
To: public-lod@w3.org
On 6/23/11 9:20 AM, Michael Brunnbauer wrote:
> re
>
> On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote:
>> Yes, WebID is out of question a good thing. I am not entirely sure, though, that you can make it a mandatory requirement for access to your site, because if a few major consumers do not use WebID for their crawlers, site-owners cannot block anonymous crawlers.
> Google, Bing and Yahoo Authenticate themself via DNS: Do a reverse lookup for
> the IP, check for some well known domains and then do a forward lookup of the
> hostname and check if it matches the IP. Much simpler to implement than WebID.
>
> config = {
> 'Googlebot':['googlebot.com'],
> 'Mediapartners-Google':['googlebot.com'],
> 'msnbot':['live.com','msn.com','bing.com'],
> 'bingbot':['live.com','msn.com','bing.com'],
> 'Yahoo! Slurp':['yahoo.com','yahoo.net']
> }
>
> Regards,
>
> Michael Brunnbauer
>
How does that deal with a DoS query inadvertently or deliberately 
generated by a SPARQL user agent?

Google and friends are the real problem to come, its the inadvertent 
SPARQL query that kicks off of a transitive crawl that's going to reek 
havoc. Basically, when FYN (Follow-Your-Nose) is executed by Bots -- 
smart Agents working on behalf of their time challenged masters .

As I said earlier, AWWW is "deceptively simple" that means: it has 
pleasant surprises in-built as these issues arise. WebID is one such 
example, courtesy of Linked Data.

"Simply Simple" != "Deceptively Simple".

Today, "Simply Simple" has become the norm. Just as not solving really 
problems has become the norm leaving the wildness of the InterWeb to run 
amok and compromise things like:

1. Email
2. Pingbacks
3. Comments
4. Federation .

We should strive to make technology flexible via architecture.

Basically, you can also look at it this way, in these exponential times: 
if you want to get into races that you will lose woefully "simply 
simple" is perfect. If you want to get into races with a good chance of 
winning "deceptively simple" is perfect. Each to his/her sense of 
perfection :-)


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Thursday, 23 June 2011 10:33:07 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:29:54 UTC