- From: Andreas Harth <andreas@harth.org>
- Date: Wed, 22 Jun 2011 14:37:02 +0200
- To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- CC: Yves Raimond <yves.raimond@gmail.com>, Christopher Gutteridge <cjg@ecs.soton.ac.uk>, Daniel Herzig <herzig@kit.edu>, semantic-web@w3.org, public-lod@w3.org
Hi Martin, first let me say that I do think crawlers should follow basic politeness rules (contact info in User-Agent, adhere to the Robot Exclusion Protocol). However, I am delighted that people actually start consuming Linked Data, and we should encourage that. On 06/22/2011 11:42 AM, Martin Hepp wrote: > OpenEAN - a transcript of>1 Mio product models and their EAN/UPC code at > http://openean.kaufkauf.net/id/ has been permanently shut down by the site > operator because fighting with bad semweb crawlers is taking too much of his > time. I've put a wrapper online [1] that provides RDF based on their API (which, incidentally, currently does not seem to work either). The wrapper does some caching and has a limit of one lookup every 8 seconds, which means (24*60*60)/8 = 10800 lookups per day. Data transfer is capped to 1 GB/day, which means a maximum cost of 0.15 Euro/day at Amazon AWS pricing. At that rate, it would take 925 days to collect descriptions of just one million products. Whether the ratio of data size and lookup limit is sensible in that case is open to debate. If the OpenEAN guys can redirect requests to [1] there would even be some continuity for data consumers. Best regards, Andreas. [1] http://openeanwrap.appspot.com/
Received on Wednesday, 22 June 2011 12:37:43 UTC