Re: Think before you write Semantic Web crawlers

On 6/23/11 9:20 AM, Michael Brunnbauer wrote:
> re
>
> On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote:
>> Yes, WebID is out of question a good thing. I am not entirely sure, though, that you can make it a mandatory requirement for access to your site, because if a few major consumers do not use WebID for their crawlers, site-owners cannot block anonymous crawlers.
> Google, Bing and Yahoo Authenticate themself via DNS: Do a reverse lookup for
> the IP, check for some well known domains and then do a forward lookup of the
> hostname and check if it matches the IP. Much simpler to implement than WebID.
>
> config = {
> 'Googlebot':['googlebot.com'],
> 'Mediapartners-Google':['googlebot.com'],
> 'msnbot':['live.com','msn.com','bing.com'],
> 'bingbot':['live.com','msn.com','bing.com'],
> 'Yahoo! Slurp':['yahoo.com','yahoo.net']
> }
>
> Regards,
>
> Michael Brunnbauer
>
How does that deal with a DoS query inadvertently or deliberately 
generated by a SPARQL user agent?

Google and friends are the real problem to come, its the inadvertent 
SPARQL query that kicks off of a transitive crawl that's going to reek 
havoc. Basically, when FYN (Follow-Your-Nose) is executed by Bots -- 
smart Agents working on behalf of their time challenged masters .

As I said earlier, AWWW is "deceptively simple" that means: it has 
pleasant surprises in-built as these issues arise. WebID is one such 
example, courtesy of Linked Data.

"Simply Simple" != "Deceptively Simple".

Today, "Simply Simple" has become the norm. Just as not solving really 
problems has become the norm leaving the wildness of the InterWeb to run 
amok and compromise things like:

1. Email
2. Pingbacks
3. Comments
4. Federation .

We should strive to make technology flexible via architecture.

Basically, you can also look at it this way, in these exponential times: 
if you want to get into races that you will lose woefully "simply 
simple" is perfect. If you want to get into races with a good chance of 
winning "deceptively simple" is perfect. Each to his/her sense of 
perfection :-)


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Thursday, 23 June 2011 10:33:07 UTC