W3C home > Mailing lists > Public > www-validator@w3.org > February 2004

Unescaped XML Ampersands Incorrectly Validated

From: Sean B. Palmer <sean@mysterylights.com>
Date: Sun, 29 Feb 2004 21:22:58 +0000
Message-ID: <40425832.3070704@mysterylights.com>
To: www-validator@w3.org

This is not well-formed, but the validator passes it:

http://infomesh.net/200X/valid-amp-bug.html

<http://validator.w3.org/check?uri=http%3A%2F%2Finfomesh.net
%2F200X%2Fvalid-amp-bug.html&charset=%28detect+automatically
%29&doctype=Inline&ss=1&verbose=1>

The issue is that there is an unescaped ampersand "&" in the source 
which is not being detected.

This issue was originally noted here:

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003Jun/0007

Masayasu Ishikawa commented:

[[[
This is one of known limitations in SP-derived SGML/XML parsers.
"Real" XML processors can easily catch this kind of fatal error,
e.g. the CSS Validator does catch such error.
]]]
http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003Jun/0009

Further IRC discussion:

<xover> Uhm. What's the problem?
<xover> The unescaped amperstand?
<xover> That's an artifical constraint imposed only by the prose of
   the XML REC and inexpressible in a DTD or SGML AFAICT.
<xover> And since OpenSP doesn't allow us to treat it as an error, we
   do the best we can by emitting a warning instead.
<sbp> nontheless, it's a constraint
<deltab> um, what is?
<sbp> ampersands must be escaped as &amp; in XML PCDATA
<deltab> yes, as they must anywhere
<xover> "anywhere" (almost) in XML. Not in SGML.
<deltab> where not in SGML?
<xover> SGML allows the & to appear bare anywhere it is unambigious.
- Swhack, 2004-02-29 21:00

Please let me know whether this is appropriate enough a bug to enter 
into the database at <http://www.w3.org/Bugs/Public/>. (It would also 
be appreciated if the validator.w3.org feedback page were more 
bug-report oriented!)

Thanks,

-- 
Sean B. Palmer, <http://purl.org/net/sbp/>
"phenomicity by the bucketful" - http://miscoranda.com/
Received on Sunday, 29 February 2004 16:23:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:11 GMT