Re: (Entire) Site validation from olivier Thereaux on 2004-04-23 (www-validator@w3.org from April 2004)

From: olivier Thereaux <ot@w3.org>
Date: Fri, 23 Apr 2004 11:06:57 +0900
To: Pete Prodoehl <pete@rasterweb.net>
Cc: www-validator <www-validator@w3.org>
Message-Id: <E5FBF9EE-94CA-11D8-9809-000393A63FC8@w3.org>

On Apr 22, 2004, at 22:13, Pete Prodoehl wrote:
>  I wrote some code to do whole site validation

Cool. Thanks for doing this and thanks for putting under the GPL.

> If anyone finds it useful, or has suggestions, please let me know.

Hmm, instead of:
[[
	# we implement screen scraping, which is just wrong, wrong, wrong...
	# this is likely to break in the future if the validator changes it's  
output...
	
	$content =~ s/^(.*?)Source\sListing.*/$1/s;
	
	my $result;
	
	if ($content =~ /This\sPage\sIs\sValid/s) {
		$result = 'OK';
		$okct++;
	}
]]

You could try what the Log Validator is doing, i.e use LWP::UserAgent  
to do a HEAD and have it check our specific HTTP headers with code  
like:
[[
			$self->valid($response->header('X-W3C-Validator-Status'));
			$self->valid_err_num($response->header('X-W3C-Validator-Errors'));
]]
More at:
http://dev.w3.org/cvsweb/perl/modules/W3C/LogValidator/lib/W3C/ 
LogValidator/HTMLValidator.pm

Note that X-W3C-Validator-Status and X-W3C-Validator-Errors aren't  
officially documented, but they're most likely to stay.

-- 
olivier

Received on Thursday, 22 April 2004 22:07:53 UTC