Re: Bug in the HTML Validator

On Mon, 7 Aug 2006, Jon Ribbens wrote:

> An excerpt such as:
>
>  <input type="text" name="foo" size="12'">
>
> is "valid" according to the DTD (and hence the Validator), but it is
> not correct HTML.

That's correct, for all published HTML DTDs (for some odd reason - the 
prose of the HTML specification restricts the values in size="..." to 
unsigned integers, so I wonder why the attribute is declared as CDATA and 
not as NUMBER - while the maxlength="..." attribute _is_ NUMBER).

> HTML Tidy will correctly identify this as a problem.

I'm not sure about this. The version of Tidy that has been built into 
HTML-Kit does not seem to issue a warning. Using the online version of 
Tidy at
http://infohound.net/tidy/tidy.pl
I get the message
   Warning: <input> attribute "size" has invalid value "12'";
This might be useful, but it's still wrong. If the message were correct, 
it should be an error message, not a warning! The value "12'" is not 
invalid, though it is incorrect.

Apparently Tidy internally checks that the size="..." attribute value is a 
sequence of decimal digits. This could be useful, especially if it were
reported correctly (surely there are other English words that could be 
used instead of abuse of the term "valid") and more informatively (the 
message is now the same for size="12'" and size="foobar", with no hint to 
what the syntax of the value _should_ be).

Compare the message with the validator's message in a case where "12'" 
_is_ invalid, namely in the value of the maxlength attribute:

(start quote of an error message)
Error Line 12 column 20: character "'" is not allowed in the value of 
attribute "MAXLENGTH".
<input maxlength="12'">
It is possible that you violated the naming convention for this 
attribute. For example, id and name attributes must begin with a letter, not a digit.
(end quote)

That's fine. Not optimal, but fine and understandable. The obvious 
direction for improving it would require rewriting major parts of the code 
so that the validator would know what it has been doing (namely parsing a 
NUMBER value, so that there would be no need for the excessively generic 
remark and the message could simply say "The value of this attribute must 
contain only digits 0 through 9.").

The online version of Tidy also seems to "clean up" valid markup, making 
it invalid: if I use <input> as a direct subelement of <body>, Tidy 
constructs a <form> element containing it. No <form> markup is required by 
the DTD _or_ by the prose of the specification; of course there might be 
good reasons not to use <input> outside a <form>, but that's a different 
story. Anyway, the markup that Tidy adds has a <form> element without an 
action="..." attribute, making it invalid. (A Tidy check of the "Tidied" 
output causes a _warning_ about a missing action="..." attribute.)

Regarding the W3C Validator, I wonder whether it _really_ has a bug in 
processing of attribute values. If I use
<input maxlength=" 12">
the validator reports no error. Yet, I cannot see anything in the SGML 
standard that would allow the leading space, when the attribute is 
declared as NUMBER.

> Both the W3C Validator and HTML Tidy are behaving correctly as
> designed, but you can see why people might get confused.

Whenever people refer to the W3C Validator as a guarantee of correctness 
or as making pages work across browsers, a substantial contribution to the 
confusion is made.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 7 August 2006 13:03:30 UTC