Re: bug with validator? from Jukka K. Korpela on 2004-09-06 (www-validator@w3.org from September 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 6 Sep 2004 22:07:07 +0300 (EEST)
To: "Suscheck, Chuck" <chuck.suscheck@colostate-pueblo.edu>
Cc: www-validator@w3.org
Message-ID: <Pine.GSO.4.58.0409062113250.22940@korppi.cs.tut.fi>

On Mon, 6 Sep 2004, Suscheck, Chuck wrote:

> The attached file validates strict.

Generally, you should post a URL, not a copy of code.

> I don't believe it should.

It would be better to tell the reasons why you believe so.

> <p />First homework assignment<p />

This is odd - probably caused by misunderstanding misguided lessons on
XHTML. <p /> is per se valid by XHTML rules. It is equivalent to <p></p>,
which is something that the HTML 4.01 specifications explicitly frowns
upon, saying that it (i.e., an empty paragraph) should not be used. But
this is not a syntactic limitation, and surely not something enforced in
a DTD. The construct <p /> also violates the recommendation that such
notations should only be used for elements with EMPTY declared content,
but this isn't a validity issue either.

However, it does become a validity issue in XHTML 1.0 Strict when it
appears directly inside <body>, since by Strict rules, "loose" text
outside block elements is not allowed.

And here comes the tricky part. The document has

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 strict //EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

which is incorrect - the quoted string has "strict" in lowercase and a
space after it, but the quoted string should be written exactly. In fact
if you do that, the validator correctly reports the three syntax errors
(character data and <br> element not allowed directly as subelements of
<body>). But when the string is malformed, the validator gets really wild.
It claims that the document "Is Valid -//W3C//DTD XHTML 1.0 strict //EN!",
which is just nonsense.

I wonder why the validator does not use the actual DTD specified by the
second string but instead performs a phoney validation. Reporting the
document as valid -//W3C//DTD XHTML 1.0 strict //EN is further confusion.
I think this confirms the observation that the validator should stop
issuing "Valid XXX!" messafes. Either a document is valid by SGML or XML
rules, or it is not. Being "Valid FooML!" means nothing, or worse.
It's a wrong way of being user-friendly.

It seems to me that when the first quoted string in the DOCTYPE
declaration is not one recognized by the validator, the validator
implicitly assumes XHTML 1.0 Transitional, without saying it, and
actually saying something completely different. At least if I modify the
document, I get error messages from constructs that don't match
XHTML 1.0 Transitional.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 6 September 2004 19:07:40 UTC