Re: Validator misbehavior on HTML sent as XHTML

Benjamin Niemann wrote:

> Nikita The Spider The Spider wrote:
> 
>> I've been running a few edge cases through the validator and I've come
>> across one that the validator doesn't like. The document in question
>> is a short, valid HTML 4.01 Strict document that gives the validator
>> fits when I send it with a media type of application/xhtml+xml.
>> Specifically, the validator reports "Validation Output: 6 Errors" and
>> then proceeds to report hundreds of errors on lines that don't exist
>> in the document.
>> 
>> The document in question is here:
>> http://NikitaTheSpider.com/boneyard/temp/070808/nonsense.xhtml
>> 
>> And here's the validation URL for it:
>>
>
http://validator.w3.org/check?uri=http%3A%2F%2Fnikitathespider.com%2Fboneyard%2Ftemp%2F070808%2Fnonsense.xhtml&charset=%28detect+automatically%29&doctype=Inline&ss=1&group=0
>> 
>> I realize that sending HTML as application/xhtml+xml is a nonsensical
>> thing to do
> 
> I've seen worse things on the web ;)
> 
>> and the validator is right to tell me that ("Contradictory
>> Parse Modes Detected!") but the actual output is clearly the result of
>> parsing something other than my document.
> 
> These errors actually are real error and come from the HTML DTD, when it
> is parsed in XML mode. SGML and XML DTDs are similar, but just as with
> document markup XML allows only a subset of the SGML constructs.
> 
> As it looks, the validator is not (yet) able to correctly report errors
> from different entities. The line numbers look as if the point to the
> correct lines of the DTD, but column 0 is obviously bogus. It also only
> counts error in the nonsense.xhtml entity. And finally it tries to link
> errors from the DTD to the document source, which is not the source of
> these errors.

First: s/errors/warnings/g - but aren't these errors? (I'm not a big fan of
XML and certainly no expert ;) )

A similar problem exists for pure SGML files as demonstrated by
<http://testbox.puppetmaster.homelinux.org/a.html>. That file references
another one with a markup error, but the error is reported for a.html.
That's obviously a useless construct in HTML (though not so uncommon with
other SGML document types), so it's questionable, if it is a high-priority
bug...

And I came across another rare edge case:
<http://testbox.puppetmaster.homelinux.org/c.html>
The description of the error is pretty confusing, as it is not the reference
to the DTD, which is broken.

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/

Received on Thursday, 9 August 2007 21:35:59 UTC