W3C home > Mailing lists > Public > www-validator@w3.org > February 2007

Characters "<" and "&" incorrectly reported as warnings only

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sat, 3 Feb 2007 13:22:01 +0200 (EET)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.64.0702031312160.340@mustatilhi.cs.tut.fi>

The occurrence of the less than sign "<" or the ampersand "&" as character 
data in an XHTML document causes the W3C Markup Validator to issue a 
warning. The document is then reported as valid. For a trivial demo, see
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.cs.tut.fi%2F%7Ejkorpela%2Ftest%2Ftest.xhtml
This is incorrect, since such occurrences are errors by XML rules:

"The ampersand character (&) and the left angle bracket (<) MUST NOT 
appear in their literal form, except when used as markup delimiters, or 
within a comment, a processing instruction, or a CDATA section. If they 
are needed elsewhere, they MUST be escaped using either numeric character 
references or the strings "&amp;" and "&lt;" respectively."
    http://www.w3.org/TR/REC-xml/#syntax

Note that even the formal syntax in the XML specification excludes
"&" and "<" as data characters:
[14]    CharData    ::=    [^<&]* - ([^<&]* ']]>' [^<&]*)
This means that the data as a whole cannot match the production for 
"document", so it does not constitute an XML document at all, i.e. it is 
not even "well-formed" (ref.: 
http://www.w3.org/TR/REC-xml/#sec-well-formed ).

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Saturday, 3 February 2007 11:22:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:23 GMT