Re: Think before you write Semantic Web crawlers from Michael Brunnbauer on 2011-06-23 (public-lod@w3.org from June 2011)

From: Michael Brunnbauer <brunni@netestate.de>
Date: Thu, 23 Jun 2011 10:20:40 +0200
To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Cc: public-lod@w3.org
Message-ID: <20110623082040.GA16384@netestate.de>

re

On Thu, Jun 23, 2011 at 10:09:25AM +0200, Martin Hepp wrote:
> Yes, WebID is out of question a good thing. I am not entirely sure, though, that you can make it a mandatory requirement for access to your site, because if a few major consumers do not use WebID for their crawlers, site-owners cannot block anonymous crawlers.

Google, Bing and Yahoo Authenticate themself via DNS: Do a reverse lookup for
the IP, check for some well known domains and then do a forward lookup of the
hostname and check if it matches the IP. Much simpler to implement than WebID.

config = {
'Googlebot':['googlebot.com'],
'Mediapartners-Google':['googlebot.com'],
'msnbot':['live.com','msn.com','bing.com'],
'bingbot':['live.com','msn.com','bing.com'],
'Yahoo! Slurp':['yahoo.com','yahoo.net']
}

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Received on Thursday, 23 June 2011 08:21:11 UTC