Validator reporting just "Not valid"

Rubert van Loen wondered, under the subject "No closing </head> tag and
still validated?", why the validator accepted a document with no </head>.
The answer to that is simple: the (customized) DTD used has the same rule
for the head element as HTML 4.01, so </head> is optional. To make the end
tag required, change
<!ELEMENT HEAD O O (%head.content;) +(%head.misc;) -- document head -->
by replacing the second "O" by "-" (hyphen-minus); and to make the start
tag required too, change the first "O" to "-" as well.

But there's more. The validator claims that "This Page Is Valid HTML 4.01
Transitional!" as the 2nd level heading, very prominently. (It puzzles me
why the the 1st level heading of a _report_ is "W3C MarkUp Validation
Service", especially considering the hints on heading usage that the
validator wants to give. But I digress.)

That isn't true, of course. It is valid, but it is not HTML 4.01.

In smaller print, there's an explanation: "This means that the resource in
question identified itself as 'HTML 4.01 Transitional' and that we
successfully performed a formal validation using an SGML or XML Parser
(depending on the markup language used)."

It seems that the validator uses HTML.Version in its report, so if you
create a customized DTD by editing an HTML 4.01 DTD, or another DTD in
HTML specifications, remove the declaration of HTML.Version (or edit it).

But what really puzzles me now is why the validator reports only
"This page is not Valid !
Below are the results of attempting to parse this document with an SGML
parser."
and then nothing, except the source listing, when I try

http://validator.w3.org/check?uri=http%3A%2F%2Fwww.cs.tut.fi%2F%7Ejkorpela%2Fhtml%2Fnobr.html

I vaguely remember that such questions have been asked before, and maybe
answered, but the situation is very confusing.

Using http://www.htmlhelp.com/tools/validator/
instead, I get the message

http://www.cs.tut.fi/~jkorpela/html/loosewbr.dtd, line 1036, character 44:
parameter entity "HTML.Version" not defined

which explains it. It suddenly becomes clear:

In addition to removing the definition of HTML.Version, we also need to
remove the line

<!ENTITY % version "version CDATA #FIXED '%HTML.Version;'">

and the line

 %version;

near the end of a DTD. This will naturally disallow the VERSION attribute
in the <html> tag, but there's no point in using that attribute anyway.

***

It would naturally be better if the validator were fixed so that it
does not use phrases like "Valid HTML 4.01 Transitional" but simply
"valid".

And it would be better if it reported errors in the DTD instead of saying
that the document is not valid, when the validator has encountered a DTD
error that makes it impossible to the validator to analyze _whether_ the
document is valid or not.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Thursday, 22 April 2004 02:06:08 UTC