- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 3 Sep 2004 23:58:56 +0300 (EEST)
- To: Charl van Niekerk <charlvn@gmail.com>
- Cc: www-validator@w3.org
On Thu, 2 Sep 2004, Charl van Niekerk wrote: > I thought having unencoded ampersands is illegal in XML. As far as I can see, they are indeed disallowed, even by well-formedness rules, since the definition of CharData (which effectively tells what is allowed outside tags and character and entity references, so to say) at http://www.w3.org/TR/REC-xml/#NT-CharData says: CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) thereby excluding "<" and "&", in accordance with prose descriptions in the XML specification. > However, the > validator only lists them under "warnings". How so? Shouldn't any > conforming XML parser crash on those? Well, not crash. :-) But even non-validating processors are required to check well-formedness. My guess is that the markup validator has been built upon a genetic SGML validator, just with some tuning, which doesn't cover this issue. The recognition of a naked "&" was apparently added ad hoc, and it was probably easier to make it issue a warning than an error. > Also, I thought unencoded ampersands is illegal in HTML too. No, they aren't, since SGML rules apply. An ampersand need not be escaped (though it has always been good practice in HTML to escape it), except when it could otherwise start an entity reference or a character reference. Thus, "R&D" is incorrect (&D must be parsed as an entity reference, and the entity is undefined), whereas "R & D" is formally OK. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Friday, 3 September 2004 20:59:29 UTC