AW: Report page in validation service is not well-formed XML when the validation was successful

In litteris suis de Donnerstag, 29. Mai 2008 19:04, olivier Thereaux
<mailto:ot@w3.org>scripsit:

Hi Olivier,
> 
> On 28-May-08, at 7:25 AM, Svensson, Lars wrote:
>> The validation results page in the css-validator returns not well-
>> formed XML when the validation was successful. When the validation
>> wasn't successful, the returned document is well-formed. First I
>> tried this with file upload and then with validation per URL, always
>> with the same result. Interestingly enough, the page validates in
>> the HTML validator, but it's obvious that the XML is invalid. The
>> problem is in line 35, where there is a closing </a> tag within the
>> <p> element but no opening <a>.
> 
> This was fixed a few weeks ago in the development version of the
> validator.
> http://dev.w3.org/cvsweb/2002/css-validator/org/w3c/css/css/xht 
ml.properties.diff?r1=1.16&r2=1.17&f=h

Good news!
> 
> Yves, let's put this into production independently of (and
> before) the
> grammar changes?

Would be awesome.
> 
> 
>> The background is that I wanted to build a validation script, upload
>> the file(s) per URL and then check for the existence of a <div
>> id="errors">. To do this, I parse the result page with an XML parser
>> (it's supposed to be XHTML) and check for the div using an Xpath.
> 
> If you're going to parse validation results as XML, I would strongly
> recommend, instead of screen-scraping the HTML output, to use:
> http://jigsaw.w3.org/css-validator/api.html

I always thought that one of the rationales for XHTML was that it should
be possible to scrape the documents using an XML parser, thus
eliminating the need for multiple response formats just to get an XML
version for machine processing. I'll check out the SOAP version, though.

Thanks for your input,

Lars


-- 
Dr. Lars G. Svensson
Deutsche Nationalbibliothek
Informationstechnik
Adickesallee 1
60322 Frankfurt
http://www.d-nb.de/

Received on Tuesday, 3 June 2008 19:31:53 UTC