Re: Bug in validator

* Luca Mascaro wrote:
>I find a problem in HTML validator.

>http://www.lucamascaro.info/test/helloworld.html

There are several problems here. The first problem is that you use the
text/html media type, such documents should be treated as HTML documents
per <http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html> and
the Validator is consequently supposed to flag the use of

  <meta http-equiv="Content-Type" content="text/html;
    charset=iso-8859-1" />

as error as character data (the ">" due to HTML parsing rules) is not
allowed in <head>. This is obviously impractical so the Validator
ignores this decision of the HTML Working Group. Of course, the HTML
Working Group refused to define rules for when text/html documents
should be processed as XHTML documents, and XHTML-compatible document
types such as RDDL as used for http://www.w3.org/2001/XMLSchema do not
pass the Validator for this reason. We get bug reports for that too,
even though there is no bug here, it just happens that people dislike
the Validator's behavior. Many people use www-validator to disagree
with HTML Working Group decisions, unfortunately.

The second problem is that the Validator does not check for rules not
spelled out in the DTD and the document is valid per the DTD. While the
Validator Team is interested in improving the Validator to detect such
problems, many conformance criteria in the various HTML and XHTML
specifications are unclear and there is often a lack of consensus among
the Validator Team participants and/or other parts of the community as
to what should be checked for, even in cases where the requirements are
very clear but generally considered obsolete.

An example would be the lexical space of the profile attribute, does it
take a single URI or a list of URIs? The specifications are very clear
that only a single URI is allowed but it also contains wording that the
value should be conisdered a list by some user agents. If we insist on
a single URI, many users would complain that this is incorrect. If we
allow multiple URIs, other users would complain that this is incorrect.
And generally, the more we change the Validator, the more complaints we
get about it beeing unreliable.

Another example is the use of non-ascii characters in URI attributes.
HTML 4.01 is very explicit about those beeing prohibed, yet if we make
the Validator complain about those, we are likely to receive negative
feedback e.g. from W3C's own I18N Activity.

http://lists.w3.org/Archives/Public/www-html-editor/2004AprJun/0175 has
another example for such a requirement. This requirement in particular
would invalidate an incredible number of existing pages that currently
pass the Validator, combined with many other small details implementing
complete checks for all such requirements is likely to cause most web
pages that currently pass the Validator to be "invalid". In many cases
for no good reason.

("invalid" in quotes as the HTML Working Group did not ever define what
it means for a document to be e.g. "Valid XHTML 1.0 Transitional", many
would argue that your document is in fact "Valid XHTML").

The HTML Working Group is aware of all these problems, unfortunately
they do not show much interest in improving their deliverables to make
a transition to conformance- rather than DTD-based validation easy, so
while we are working on this, I would not expect much in this regard
before the end of this year.

Without the HTML Working Group making long overdue improvements to HTML
4.01 and specifications that depend on it, I am not sure whether we will
ever be in a position to include such features in the release version
though. As member of the HTML Working Group you might be able to help us
improving the Validator this way; that would be greatly appreciated.

Thanks for your report,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Tuesday, 12 April 2005 12:10:57 UTC