- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 15 Sep 2004 13:01:18 +0200
- To: Dominique Hazaël-Massieux <dom@w3.org>
- Cc: public-qa-dev@w3.org
* Dominique Hazaël-Massieux wrote: >- for the greater plan (e.g. site validation with the new v.w.o), I >think it would be cool to start thinking to a way for a site to indicate >to a particular agent what kind of crawling it accepts; I agree that >extending robots.txt doesn't seem very reasonable, so we should start >thinking to another way of doing it... We should first figure out what the actual problem is. Site Validaton for the Markup Validator would likely be a subscription service like, login, start a site validation and come back later to see the results. This would enable us to sleep between requests as much as we like. It would solve the heavy load problem. There are however other problems that would not be solved, for example a webmaster might not wish that someone performs site validation on their site. If that is a problem that we want to solve we would need to figure out how to determine whether someone is authorized to request site validation for a site. We could, for example, let the webmaster@ opt-in for such a service. robotos.txt and similar mechanisms would not really help here, while the real geocities.com webmaster might want to use the feature, he might not want to allow any normal geocities.com user to use it. An alternate approach would be to limit how many times site validation would validate individual documents in a week. This is also difficult, for example, how would that work for example.org/?SID=1234567890ABCDEF which might vary for every request. We could ignore query parts when determining URI equality which would make the service quite useless for people using things like example.org/page.php?page=1. Or instead of using email as verification system we could require users to encode their validator.w3.org user name in the robots.txt file like User-Agent: W3C-Markup-Validator/bjoern Allow: / or User-Agent: W3C-SiteValid/c7713a0f32cd6bfb57b6142d80f5d7a1c73e2402 (where the ID is my email address as sha1_hex). If there is no entry for the Markup Validator it assumes Dissallow:/. Of course, this would not work for all users either.
Received on Wednesday, 15 September 2004 11:02:09 UTC