- From: Dominique Hazaël-Massieux <dom@w3.org>
- Date: Wed, 15 Sep 2004 11:40:19 +0200
- To: olivier Thereaux <ot@w3.org>
- Cc: QA Dev <public-qa-dev@w3.org>
- Message-Id: <1095241219.21224.43.camel@stratustier>
Le mer 15/09/2004 à 11:06, olivier Thereaux a écrit : > Also discussed was "speed" issue with the new implementation of > checklink on top of RobotUA: > - it cannot sleep() less than one second, that makes it "slow" > compared to the previous (bullish and problematic) versions > the former has neither obvious solution (see log for thoughts on the > matter) nor is it formally accepted as an issue. I read the IRC logs which has some interesting comments; a few points: - I really think the slowness of checklink is an issue for its users, and as Olivier pointed out, esp. for W3C users; I guess the "private server" option (appropriately set to *.w3.org on v.w.o) would thus help; I honestly don't think we should refrain from doing this based on the possibility of someone using it for a DoS; the possibility is anyway so easy to re-enable that it's hardly a protection; I guess a sane default in the provided configuration file plus a human readable warning besides it would be good enough - for the greater plan (e.g. site validation with the new v.w.o), I think it would be cool to start thinking to a way for a site to indicate to a particular agent what kind of crawling it accepts; I agree that extending robots.txt doesn't seem very reasonable, so we should start thinking to another way of doing it... This unfortunately relates very strongly to one 18 months-old TAG issue: http://www.w3.org/2001/tag/issues.html?type=1#siteData-36 Of course, we don't need to solve the whole issue if we want to explore this; but it would probably be wise to keep he relevant points made in it when designing something for our own needs. Possible implementation ideas: * a new <meta name="foo" content="bar"> for HTML files; not very elegant, and limited to HTML, but reasonably practical for Web authors * a new HTTP header, using e.g. the HTTP extensions mechanism I'm not sure whether this is something qa-dev would be interested to work on; but I figured I'd better brain-dump my ideas here, in case someone would indeed be interested... Dom -- Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/ W3C/ERCIM mailto:dom@w3.org
Received on Wednesday, 15 September 2004 09:40:21 UTC