Batch validator (Was: /

Hello Marc-Antoine,

On Jun 13, 2007, at 00:25 , Marc-Antoine Ross wrote:
>  Recently, the W3C blacklisted our service because it was using too  
> much ressources on their servers, according to their policy.

To be fair, you should also mention that I had contacted you a number  
of times in the span of a year, urging you to install a local  
validator because your service, albeit indeed very nice, was sending  
a lot of requests to the validator, and in fast and large bursts,  
which I explained at the time, was not correct behavior according to  
the usage policy for the API. When we switched on automatic  
protection mechanisms to protect the validation service from abuse,  
your site was logically blocked...

>  1/ Install the validator locally on our server and run it from  
> there. A good advantage would be that each page will be read only  
> once by the service compared to twice before (one by  
> and once by the W3C). Problem: I don't have anybody available to  
> install the validator.

More than a year ago, you said you'd talk to your admin about  
installing the validator locally, and I offered to get in touch to  
help. The validator's installation is not trivial, but it's not  
rocket science either, and many have managed to install it on their  
server in the past. Anyway, I reiterated my offer to help a few  
times, and reiterate it here again.

> 3/ Offer the code and my contribution to the W3C and make this  
> service widely available and supported.

As I told you a few times, I think your project is very cool, and if  
you want to release it as open source and contribute it to W3C, that  
is very welcome. There are a number of reasons why I think we may not  
be able to integrate the code "as is" in the validator.

A minor reason is that it relies a lot on javascript/XHR. This makes  
for a cool UI, but without a fallback mechanism, we can't offer it to  
our wide user base. A more important reason is the cause of your  
blacklisting: your validator interface sends large and fast bursts of  
requests to the validator, without any consideration of current  
server load, and if we were to open such an interface at, it would likely kill our servers, or make them  
unusable for its millions of users. The indexing also sends lots of  
requests to the validated site, and indexes it without considering  
the preferences of the webmaster in the robots.txt. We can't really  
do that.

I think, given these constraints, the "perfect" batch-validation  
service would:
* get users to register a batch job. Ideally with a mail loop,  
checking that the person who requests the batch job actually owns the  
site that is being validated.
* The validator keeps a queue of batch jobs to be made, and processes  
them (server-side) whenever the server load allows it. Indexing of  
the site would be done with a pause between requests, and would  
respect the robots.txt protocol. Alternatively, the user can request  
validation of a list of URIs.
* The batch-validation would use the validator's API to get results.
* Once the batch job is finished, the requesting user would receive a  
mail with a summary of the job, and a link (or the content in the  
mail?) to the full results.

Admittedly this is getting a bit far from your batch validator, but  
we do have some building blocks here and there that could be used  
(e.g the logvalidator). If you think your code could be used,  
adapted, or if you're interested in participating in the development  
of the validator in that direction, the door is always open.


Received on Monday, 25 June 2007 05:55:43 UTC