- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 12 Apr 2005 14:11:20 +0200
- To: "Luca Mascaro" <info@lucamascaro.info>
- Cc: <www-validator@w3.org>
* Luca Mascaro wrote: >I find a problem in HTML validator. >http://www.lucamascaro.info/test/helloworld.html There are several problems here. The first problem is that you use the text/html media type, such documents should be treated as HTML documents per <http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html> and the Validator is consequently supposed to flag the use of <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> as error as character data (the ">" due to HTML parsing rules) is not allowed in <head>. This is obviously impractical so the Validator ignores this decision of the HTML Working Group. Of course, the HTML Working Group refused to define rules for when text/html documents should be processed as XHTML documents, and XHTML-compatible document types such as RDDL as used for http://www.w3.org/2001/XMLSchema do not pass the Validator for this reason. We get bug reports for that too, even though there is no bug here, it just happens that people dislike the Validator's behavior. Many people use www-validator to disagree with HTML Working Group decisions, unfortunately. The second problem is that the Validator does not check for rules not spelled out in the DTD and the document is valid per the DTD. While the Validator Team is interested in improving the Validator to detect such problems, many conformance criteria in the various HTML and XHTML specifications are unclear and there is often a lack of consensus among the Validator Team participants and/or other parts of the community as to what should be checked for, even in cases where the requirements are very clear but generally considered obsolete. An example would be the lexical space of the profile attribute, does it take a single URI or a list of URIs? The specifications are very clear that only a single URI is allowed but it also contains wording that the value should be conisdered a list by some user agents. If we insist on a single URI, many users would complain that this is incorrect. If we allow multiple URIs, other users would complain that this is incorrect. And generally, the more we change the Validator, the more complaints we get about it beeing unreliable. Another example is the use of non-ascii characters in URI attributes. HTML 4.01 is very explicit about those beeing prohibed, yet if we make the Validator complain about those, we are likely to receive negative feedback e.g. from W3C's own I18N Activity. http://lists.w3.org/Archives/Public/www-html-editor/2004AprJun/0175 has another example for such a requirement. This requirement in particular would invalidate an incredible number of existing pages that currently pass the Validator, combined with many other small details implementing complete checks for all such requirements is likely to cause most web pages that currently pass the Validator to be "invalid". In many cases for no good reason. ("invalid" in quotes as the HTML Working Group did not ever define what it means for a document to be e.g. "Valid XHTML 1.0 Transitional", many would argue that your document is in fact "Valid XHTML"). The HTML Working Group is aware of all these problems, unfortunately they do not show much interest in improving their deliverables to make a transition to conformance- rather than DTD-based validation easy, so while we are working on this, I would not expect much in this regard before the end of this year. Without the HTML Working Group making long overdue improvements to HTML 4.01 and specifications that depend on it, I am not sure whether we will ever be in a position to include such features in the release version though. As member of the HTML Working Group you might be able to help us improving the Validator this way; that would be greatly appreciated. Thanks for your report, -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Tuesday, 12 April 2005 12:10:57 UTC