Re: UKUUG Website query from Jukka K. Korpela on 2004-01-03 (www-validator@w3.org from January 2004)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sat, 3 Jan 2004 14:37:11 +0200 (EET)
To: Ralph Corderoy <ralph@inputplus.co.uk>
Cc: Charles Curran <Charles.Curran@ukuug.org>, www-validator@w3.org, webmaster@ukuug.org
Message-ID: <Pine.GSO.4.58.0401031422210.2103@korppi.cs.tut.fi>

On Sat, 3 Jan 2004, Ralph Corderoy wrote:

> I agree, validator declares `This Page Is Valid HTML 4.01
> Transitional!' despite the title entity containing a comment which is
> not allowed according to
>
>     http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#edef-TITLE
>
>     "Titles may contain character entities (for accented characters,
>     special characters, etc.), but may not contain other markup
>     (including comments)."

That verbal statement has no impact on validation, which is a purely
formal operation that must not pay attention to any attempts to use prose
to impose restrictions not expressed in a DTD. In fact, it must not even
see such attempts - it must be blind to anything outside the formal
part of the DTD.

The title element is declared in HTML 4.01 DTDs as

<!ELEMENT TITLE - - (#PCDATA) -(%head.misc;) -- document title -->

which is odd (there's no reason to use exclusion since the basic
content model is (#PCDATA) which does not allow any elements, and
pragmatically it is misleading). But in SGML terms it implies mixed model
parsing, which means that character references and entity references -
lumped together under the misleading term "character entities" in W3C
prose - are recognized and comment declarations _are_ parsed. Even tags
are parsed, but since (#PCDATA) is alone in the content model, this cannot
result in recognizing any valid element inside the title element.

> Perhaps validator justs thinks the comment is ordinary title text for
> display, as does the galeon browser, but in that case I'm surprised it
> doesn't warn that `<', etc., should be encoded using character entities.

A validator recognizes a comment declaration. What browsers do is a
different issue. As rule #1, Web browsers should be assumed to get SGML
parsing wrong in all nontrivial situations, and in some trivial situations
too. Exceptions include Lynx, which seems to process a title element
containing a comment declaration quite correctly. But it is of course best
to avoid using markup that is known to get mishandled by the most popular
browsers.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Saturday, 3 January 2004 07:44:49 UTC