Re: Validators, Validation chart from Henri Sivonen on 2006-04-28 (www-validator@w3.org from April 2006)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 28 Apr 2006 19:12:23 +0300
To: www-validator@w3.org
Message-Id: <2460262C-5CA9-4170-AAF3-316589B3ADC7@iki.fi>
On Apr 21, 2006, at 09:59, olivier Thereaux wrote:

> On Apr 20, 2006, at 8:10, Rick Stanley wrote:
>> in the chart on:
>> http://www.validome.org/lang/en/errors/ALL
>>
>> ther are many discrepencies betwen the W3c validator, Validome,  
>> and the
>> other validators, WDG, and Site Valet.

If none of them were able find errors that the W3C Validator cannot  
find, there'd be no point in making other validation services.

The WDG Validator used to differentiate on the understandability of  
error messages and now differentiates mainly on batch validation.

Page Valet differentiates by offering an XML parser (but see below)  
and by offering customization to the SGML declaration.

Validome differentiates by going beyond DTD validation and by  
providing error messages in languages other than English.

Schneegans (http://schneegans.de/sv/) differentiates by using XSD and  
by actually using a real XML parser.

Relaxed (http://badame.vse.cz/validator/) differentiates by using  
RELAX NG and Schematron and by actually using a real XML parser.

My validation service (http://hsivonen.iki.fi/validator/)  
differentiates by using RELAX NG and Schematron *and* by allowing  
user-supplied schemas, by actually using a real XML parser, by  
refusing to even try DTD validation and by providing an experimental  
parser for HTML5 which also has practical applicability to HTML 4.01.

It would have been interesting to see the Validome team test also the  
these latter three.

> The validators by WDG, W3C, Validome and Webthing (Valet) are all  
> fine tools, very reliable for all but some complex cases.

Umm. The Validome test cases are not particularly complex. I'd  
characterize them as simple.

> Those complex cases, often caused by unclear (or, sometimes,  
> erroneous) specifications, are the main cause for discrepancies  
> between the tools: when the developer is left without a clearly  
> documented choice, he or she has to make an "educated guess" as to  
> what is the best decision.

There's nothing unclear about e.g. the first set of tests concerning  
the XML declaration. What I find interesting is that even Page Valet  
fails, which suggests it does some broken trickery before letting  
Xerces see the document.

> The document you mention was unknown to me. It looks like a great  
> collection of tests, indeed, the validome team did a great job  
> compiling it.

Indeed. I already fixed one bug in my fork of Ælfred2.

> The results table, however, is another thing. Generally speaking  
> one should be wary of test results used as promotional material.

Because it inconveniently reveals that the tested competitors exhibit  
bogosity even at the very basic XML well-formedness checking (and  
that W3C and WDG don't even use an XML parser)?

> I wish the Validome team used their table of tests as a development  
> helper, rather than an advertisement. Claiming to be 100% perfect  
> and bug-free is rather dubious, as is the systematic choice,  
> whenever the different tools have a different behavior, to take  
> validome as the obvious correct reference.

Actually, the tests are pretty reasonable. When Validome "succeeds"  
and the W3C Validator "fails", there tends to be a reasonable  
explanation based on higher-level spec language that DTD validation  
won't catch or on practical matters.

http://www.validome.org/out/ena3013 is an exception from the  
practical point of view.

> * System-ID missing (at PUBLIC) in HTML-Document
> http://www.validome.org/lang/en/errors/DOCTYPE/4011
> The system id is not mandatory, and while it is strongly  
> recommended for non-standard document types, there's no rule that I  
> know of that forces a parser (or a validator) to report it. The  
> fact that validome is throwing a warning here may or may not be a  
> good idea (some might say there should not be a warning if there is  
> no risk of problem), but that does not make the other tools wrong.

There's no SGML-level reason to report it. However, depending on how  
you read the HTML 4.01 spec, there may be. (The spec enumerates three  
particular doctypes.)

> * Absolute no charset encoding statement
> http://www.validome.org/lang/en/errors/HTML-CHARSET/8
> Validome throws a fatal error when no charset declaration is found,  
> and declares the document invalid.
> First, the document *is* valid: there is no constraint on character  
> encoding for validation, it's just that if no charset can be found,  
> it's hard to read, and therefore parse, the document.

Forgetting SGML for a while, do you think it unreasonable to report a  
condition where the document cannot be unambiguously read?

> Also, the W3C markup validator used to throw a fatal error in such  
> cases, too, but our users told us this was awfully unhelpful, so  
> the validator is now trying tentative validation using a fallback  
> charset, as do valet and the WDG validator. I'm not sure I'd call  
> validome right and the other three wrong on this one ;).

So the users found it unhelpful that documents that were unparseable  
without guesswork weren't parsed based on guesswork.

> * XML- and Meta-charset encoding are different to HTTP-Header  
> charset encoding
> http://www.validome.org/lang/en/errors/XML-CHARSET/2010
> Plain wrong. HTTP headers have precedence over other charset  
> declaration methods.
> e.g http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

HTTP takes precedence, but having a discrepancy is a sign of a  
problem somewhere and, therefore, quite useful to report.

> * Errors in Attribute-values
> This is interesting, because these tests are NOT about validation,  
> but conformance. There's no doubt that checking for them is useful,  
> but when validome says http://www.validome.org/out/ena6001 is not  
> valid, it's a erroneous statement. The document is not conformant,  
> but it's valid.

Unless, of course, their non-DTD validation formalism sees inside  
attributes. (I don't know if they have a non-DTD validation formalism  
or whether it is just useful ad hoc code.)

> The ultimate reference, always, is the specification, and in the  
> case of markup, the many specifications, from URI and SGML, to  
> HTTP, to the various markup languages. The W3C validator is not a  
> perfect reference - it certainly has some bugs.

Not to mention that the W3C validator doesn't even attempt check the  
"various markup languages" part beyond what can be expressed in an  
SGML DTD.

> When you cannot be perfect, it's better to be honest and  
> accountable about it.

Oh, you mean like this:
http://validator.w3.org/check?uri=http%3A%2F%2Fhsivonen.iki.fi%2Ftest% 
2Fno-space-between-attributes.xhtml

The result page claims:
"The document located at <http://hsivonen.iki.fi/test/no-space- 
between-attributes.xhtml> was checked and found to be valid XHTML 1.0  
Strict. This means that the resource in question identified itself as  
"XHTML 1.0 Strict" and that we successfully performed a formal  
validation using an SGML or XML Parser (depending on the markup  
language used)."

There's no "or XML Parser (depending on the markup language used)",  
is there?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Friday, 28 April 2006 16:12:38 UTC