Re: Bug in Validator - more vivid example

Hello Artemy, all,

On Jul 25, 2007, at 15:10 , Artemy Lomov wrote:
> Even this page is 'not valid' XHTML:
>
> http://validator.w3.org/check?uri=http%3A%2F%2Fvalidator.w3.org%2F:
>
> ***
>
> Failed validation, 4 Errors
>
> Line 425, Column > 80: XML Parsing Error: Entity 'nbsp' not defined.
> Line 425, Column > 80: XML Parsing Error: Entity 'nbsp' not defined.
> Line 450, Column > 80: XML Parsing Error: Entity 'copy' not defined.
> Line 451, Column > 80: XML Parsing Error: Entity 'reg' not defined.

This was a hard bug to hunt down, hard to reproduce as reloading the  
same validation page would sometimes give different results.

We found that the libxml2-based parser would fetch a lot of schema  
and entity files, resulting in being sometimes temporarily banned by  
www.w3.org servers. As a result, entities could not be dereferenced,  
and the parser would throw errors.

We fixed the issue by not letting the xml parser fetch remote DTD/ 
entity files, and filtering out errors about undefined entities. The  
fix is in production as of now.

-- 
olivier

Received on Thursday, 26 July 2007 02:45:43 UTC