Re: [meeting] Notes and log 2004-09-14

Le mer 15/09/2004 à 11:06, olivier Thereaux a écrit :
> Also discussed was "speed" issue with the new implementation of 
> checklink on top of RobotUA:
>   - it cannot sleep() less than one second, that makes it "slow" 
> compared to the previous (bullish and problematic) versions

> the former has neither obvious solution (see log for thoughts on the 
> matter) nor is it formally accepted as an issue.

I read the IRC logs which has some interesting comments; a few points:
- I really think the slowness of checklink is an issue for its users,
and as Olivier pointed out, esp. for W3C users; I guess the "private
server" option (appropriately set to *.w3.org on v.w.o) would thus help;
I honestly don't think we should refrain from doing this based on the
possibility of someone using it for a DoS; the possibility is anyway so
easy to re-enable that it's hardly a protection; I guess a sane default
in the provided configuration file plus a human readable warning besides
it would be good enough

- for the greater plan (e.g. site validation with the new v.w.o), I
think it would be cool to start thinking to a way for a site to indicate
to a particular agent what kind of crawling it accepts; I agree that
extending robots.txt doesn't seem very reasonable, so we should start
thinking to another way of doing it... This unfortunately relates very
strongly to one 18 months-old TAG issue:
http://www.w3.org/2001/tag/issues.html?type=1#siteData-36
Of course, we don't need to solve the whole issue if we want to explore
this; but it would probably be wise to keep he relevant points made in
it when designing something for our own needs. Possible implementation
ideas:
* a new <meta name="foo" content="bar"> for HTML files; not very
elegant, and limited to HTML, but reasonably practical for Web authors
* a new HTTP header, using e.g. the HTTP extensions mechanism

I'm not sure whether this is something qa-dev would be interested to
work on; but I figured I'd better brain-dump my ideas here, in case
someone would indeed be interested...

Dom
-- 
Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/
W3C/ERCIM
mailto:dom@w3.org

Received on Wednesday, 15 September 2004 09:40:21 UTC