Characters "<" and "&" incorrectly reported as warnings only

The occurrence of the less than sign "<" or the ampersand "&" as character 
data in an XHTML document causes the W3C Markup Validator to issue a 
warning. The document is then reported as valid. For a trivial demo, see
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.cs.tut.fi%2F%7Ejkorpela%2Ftest%2Ftest.xhtml
This is incorrect, since such occurrences are errors by XML rules:

"The ampersand character (&) and the left angle bracket (<) MUST NOT 
appear in their literal form, except when used as markup delimiters, or 
within a comment, a processing instruction, or a CDATA section. If they 
are needed elsewhere, they MUST be escaped using either numeric character 
references or the strings "&amp;" and "&lt;" respectively."
    http://www.w3.org/TR/REC-xml/#syntax

Note that even the formal syntax in the XML specification excludes
"&" and "<" as data characters:
[14]    CharData    ::=    [^<&]* - ([^<&]* ']]>' [^<&]*)
This means that the data as a whole cannot match the production for 
"document", so it does not constitute an XML document at all, i.e. it is 
not even "well-formed" (ref.: 
http://www.w3.org/TR/REC-xml/#sec-well-formed ).

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Saturday, 3 February 2007 11:22:21 UTC