Re: validator.nu from Henri Sivonen on 2008-02-05 (public-html-comments@w3.org from February 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 6 Feb 2008 00:58:00 +0200
To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
Cc: <public-html-comments@w3.org>
Message-Id: <2C0DEB53-489E-41D1-A416-F7BD2A7AA0AD@iki.fi>
Disclaimer: Still not a WG response.

On Feb 5, 2008, at 23:28, Frank Ellermann wrote:
>
> Henri Sivonen wrote:
>
>> I suggest sending feedback to the ISO/IEC FDIS 19757 committee.
>
> If necessary that's up to somebody knowing what "FDIS 19757" is,
> I'm happy with getting the drift and limitations of the few DTDs
> I care about.

Validator.nu doesn't and won't do DTD-based validation. I prefer to  
say what Validator.nu is rather that what it isn't, but if I were to  
define it in terms of what it isn't, it most deliberately isn't a DTD- 
based validator.

>> I'm not interested in delivering a tool to authors who try to
>> make a point by authoring new HTML 2.0 or 3.2 content today.
>
> Right, and I'm not very interested in HTML at all where XHTML 1
> does what I want.  But you are obviously interested in reviving
> a kind of HTML, and SGML comments actually work in browsers I've
> tested.

http://ln.hixie.ch/?start=1137799947&count=1

> Truth in advertising - what you do is not really "HTML"
> or XHTML, it's a new class of its own.

I thought the About page was truthful. It says Validator.nu doesn't do  
DTD-based validation and has no SGML functionality whatsoever.

[...]
>>  [DTD subset]
>>> that's apparently not (yet) supported by http://validator.nu
>>> and FWIW also in no browser I know.
>
>> If it isn't supported in any browser, it would be less useful
>> if the validator didn't point out the problem, wouldn't it?
>
> The WDG validator showed a warning, good.  The W3C validator
> accepted <br /> as valid HTML for some time, that was ugly.
> But validator.nu drops the ball for valid DTD subsets, bad.

HTML5 parsing has no such thing as a valid DTD subset. The XML spec  
makes most DTD-processing optional. For XML, Validator.nu can be  
configured to skip external entities (the prudent and more compatible  
default) or to process external entities (usually rendering the  
results irrelevant to the Web). By design, Validator.nu cannot be  
configured to perform XML-based validation.

> It's not "pointing out the problem", as the WDG warning does,
> it's lost with some syntactically valid pre-HTML5 constructs.
>
> A clear error message would be "DTD subset not supported", or
> if that's simply not allowed in XHTML 1 (dunno) say "invalid".

The error conditions follow the HTML5 parsing spec without ascribing  
SGML meaning to syntax errors.

[...]
> | External encoding information specified a non-UTF-8/non-UTF-16
> | encoding (ISO-8859-1), but there was no matching internal
> | encoding declaration.
>
> PURL redirects to www.xyzzy.claranet, unfortunately that now
> redirects again to home.claranet.de/xyzzy, and there I get an
> (erroneous) Content-Type: text/html; charset=ISO-8859-1
>
> Everybody and his dog knows that many authors cannot fix weird
> ideas of HTTP servers.

The impression that I get is that the TAG and the HTTP WG aren't part  
of "everyone and his dog".

> IMO (X)HTML validators should completely ignore the HTTP layer and  
> focus on the job at hand, report only
> issues *within* a document hoping to be valid (X)HTML.  Offering
> HTTP consistency checks as *option* (opt-in) is of course fine.

Validator.nu is a quality assurance tool. It would be silly for it not  
to point out potential actual problems and instead focus on the  
historic fiction that text/html were parsed as SGML. (If the HTTP  
layer and the internal encoding declaration disagree, chances are the  
HTTP layer, which is authoritative, is wrong because the declaration  
closer to content is more likely to be right. That's a potential  
actual problem. Hence, a warning.)

> That's of course no special validator.nu issue, other validators
> also mix different layers (transport and content) into a rather
> confusing (for ordinary users) mess.

I might be persuaded to ignore Content-Type if you can get the TAG to  
repeal mime-respect and the IETF HTTP WG to endorse content sniffing  
and to deprecate Content-Type.

>> http://hixie.ch/advocacy/xhtml
>
> It won't surprise anybody that I disregard this text,
[...]

I don't, and Validator.nu is programmed accordingly. More to the  
point, the parser selection in Validator.nu follows browser reality.

[...]
>> HEAD /sgml-lib/REC-xhtml1-20020801/xhtml-lat1.ent HTTP/1.1
> [...]
>> Host: validator.w3.org
> [...]
>> Content-Type: chemical/x-pdb
>
> Brilliant, three involved servers (claranet, Google, W3C), and
> none of them gets the relevant content types right.  It never
> occured to me that validator.w3.org could be a problem, I only
> checked claranet and googlepages... :-(

I'm going to make the error message identify the URL of the HTTP  
resource.

> As noted above, just ignore what HTTP servers say, all you get
> are mad lies, resulting in hopelessly confusing error messages
> about issues not under the control of the tester.

That's like people on www-validator complaining that their invalid ad  
serving boilerplate is not under their control. Making the references  
to a misconfigured server is under your control. The role of  
Validator.nu is to point out stuff like this without regard to whose  
configuration mistake it is.

[...]
> Or I confused it with the xml:lang errors, all my pages have
> xml:lang alongside 'lang' everywhere, as required.  Reporting
> this as error is wrong for XHTML 1.

If you use the XHTML 1.0 schema with the XML parser, this isn't  
reported as an error. If you use the XHTML 1.0 with the HTML5 parser,  
it is. If the latter combination is chosen automatically, it even  
tells you about this situation.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 5 February 2008 22:58:17 UTC