W3C home > Mailing lists > Public > whatwg@whatwg.org > September 2009

[whatwg] Ambiguous ampersand

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 15 Sep 2009 01:52:58 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0909150137580.14605@hixie.dreamhostps.com>
On Tue, 8 Sep 2009, ?istein E. Andersen wrote:
>
> According to ? 9.1.4 Character references, "An ambiguous ampersand is a 
> U+0026 AMPERSAND (&) character that is followed by some text other than 
> a space character, a U+003C LESS-THAN SIGN character ('<'), or another 
> U+0026 AMPERSAND (&) character", text being "allowed inside elements, 
> attributes, and comments" (? 9.1.3 Text). (Should that be "attribute 
> values"? Either is probably acceptable.)
> 
> This text does not seem to define the ampersand in <element attr=&> as 
> ambiguous, but it still causes a parse error. <element attr=& attr2>, 
> <element attr="&"> and <element attr='&'> are all conforming, so the 
> most consistent solution would probably be to remove the parse error by 
> setting the "additional allowed character" to '>' when encountering an 
> ampersand in the "Attribute value (unquoted)" state.

Fixed, thanks.


> Also, making the sequence "&<" conforming in (quoted) attribute values, 
> where the '<' occurs as text, seems inconsistent.

If we made &< non-conforming everywhere, then to detect this case would be 
ridiculously complicated in <title> elements:

   <title> test &< test &<!-- test &< &</title> --> &</foo> &</title>

Which are compliant and which are not?

Making &< conforming everywhere that the < is conforming text is more 
consistent than making &< only conforming in RCDATA text.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 14 September 2009 18:52:58 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:52 UTC