- From: Andreas Harth <andreas@harth.org>
- Date: Tue, 21 Jun 2011 14:29:00 +0200
- To: public-lod@w3.org
Dear Martin, I agree with you in that software accessing large portions of the web should adhere to basic principles (such as robots.txt). However, I wonder why you publish large datasets and then complain when people actually use the data. If you provide a site with millions of triples your infrastructure should scale beyond "I have clicked on a few links and the server seems to be doing something". You should set HTTP expires header to leverage the widely deployed HTTP caches. You should have stable URIs. Also, you should configure your servers to shield them from both mad crawlers and DOS attacks (see e.g., [1]). Publishing millions of triples is slightly more complex than publishing your personal homepage. Best regards, Andreas. [1] http://code.google.com/p/ldspider/wiki/ServerConfig
Received on Tuesday, 21 June 2011 12:29:34 UTC