Re: Case sensitivty bug, CSS

On Fri, 11 Jul 2003, Mukul Sabharwal wrote:

> I just wanted to let you guys know, that the HTML validator, on the
> strict html dtd, does not point out that if you are using :
>
> <META HTTP-EQUIV="Content-Type" CONTENT="TEXT/HTML; charset:UTF-8">

The question was answered by Bjoern Hoehrmann, who briefly pointed at a
recorded bug in the "CSS Validator". But I think that a short additional
note is in order (on the www-validator list only), for clarification.

Media type names are case insensitive, so TEXT/HTML is just as good as
text/html. Media type name structure is defined in MIME specifications,
and the statement that they are case insensitive seems to have been buried
deep into RFC 2045:
http://www.mhonarc.org/~ehood/MIME/2045/rfc2045.html#5.1

But this is _not_ checked in any way by a validator. The situation is
confused, first, by calling various checkers like "CSS Validator"
validators and, second, by the fact that some validators and other
checkers may _process_ element attributes. In the validation process,
a validator (in the SGML sense of the word) never "does" anything with
attributes; it only checks that their syntax is correct, as far as the
formalized description (in the DTD) is considered. So if a validator
follows some links, for example, that's completely external to validation
itself.

And for the CONTENT attribute, as well as for the TYPE attribute, the
validation process is very simple. The attribute is declared as CDATA,
which means, roughly speaking, 'any character string'. There's nothing a
validator could do to check that such an attribute has a value that
complies with HTML specifications, since the rules for those values have
not been (and could not be) described in SGML.

Thus, CONTENT="almost anything you like to type here" is definitely
valid.

And, in particular,
CONTENT="TEXT/HTML; charset:UTF-8"
is definitely valid too, but definitely incorrect. The charset parameter,
if present, must contain an equals sign "=" and not a colon ":", i.e.
CONTENT="TEXT/HTML; charset=UTF-8".

The morale is that you should not expect a validator to check the
correctness of values of attributes declared as CDATA - and this covers
most attributes, as you can see from the index
http://www.w3.org/TR/html4/index/attributes.html
(where most of the notations like %Text actually stand for CDATA).
Some attributes, like align, are declared by specifying an enumerated list
of permissible values, and in _that_ case a validator checks for the
attribute value correctness.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Wednesday, 16 July 2003 01:31:45 UTC