Re: Which output version is considered the "canonical" validator output?

Hi Brian,

On May 15, 2007, at 15:14 , Brian Wilson wrote:
> Most specifically, I want to implement messageid notations in the  
> SOAP output (I notice Miguel Gastelumendi requested the same thing  
> a few days ago), but the perl templating system is not letting me  
> get very far.

I just coded this up:
http://lists.w3.org/Archives/Public/www-validator-cvs/2007May/0054.html

> While I have a good familiarity with perl, the templating system is  
> resisting being cracked (so far).

The templating engine we are using is documented at e.g
http://html-template.sourceforge.net/html_template.html
if you're interested.

But mostly, the tricky part is not su much the templates, but what  
variables are passed to the templates. For that one has to dive into  
the validator code.


> I was going to "settle" for using the SOAP output as-is, but I've  
> run in to an issue tonight.
>
> In analyzing the following URL, the HTML, SOAP and XML outputs all  
> return different information using the default settings on:
>    http://ebni.com/byrds/
>
> - The HTML output has a single fatal warning(04):
>    No Character Encoding Found!

Which is correct.

> - The XML output has 70 warnings (it must be assuming a char encoding)

Odd, indeed. It seems to be assuming iso-8859-1 and succeeding.
The XML output is deprecated, however, as you noticed.

> - The SOAP output just appears to be broken

Looks broken indeed. I'll look into this, probably tomorrow.

> So, I'm left with treating the HTML output as being the "canonical"  
> output. I'm OK with that.

I would strongly recommend again doing scraping from the HTML output  
though, as it is likely to change now and then as we make the output  
layout/style evolve.

Thanks!
-- 
olivier

Received on Wednesday, 16 May 2007 09:15:49 UTC