W3C home > Mailing lists > Public > public-html@w3.org > July 2007

Ambiguous ampersands (detailed review of Writing HTML documents)

From: Simon Pieters <simonp@opera.com>
Date: Mon, 23 Jul 2007 21:33:52 +0200
To: public-html <public-html@w3.org>
Message-ID: <op.tvxnaqs3idj3kv@hp-a0a83fcd39d2>

(This is part of my detailed review of the Writing HTML documents section.)

At the tokenization level, a stray ampersand is allowed if the character  
following it is one of U+0009, U+000A, U+000B, U+000C, U+0020, U+003C,  
U+0026, or EOF.


The syntax section says:

    An ambiguous ampersand is a U+0026 AMPERSAND (&) character that is not
    the last character in the file, that is not followed by a space
    character, that is not followed by a start tag that has not been
    omitted, and that is not followed by another U+0026 AMPERSAND (&)


This doesn't catch all cases. "<" characters can also be the start of an  
end tag, a comment, an escaping text span start (in the RCDATA case), or  
the actual character (in the RCDATA or attribute value cases). "&"  
characters can also be the start of a character entity reference.

Simon Pieters
Opera Software
Received on Monday, 23 July 2007 19:33:59 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:24 UTC