- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Tue, 10 Dec 2013 12:45:16 -0500
- To: whatwg@lists.whatwg.org
On 12/10/13 11:11 AM, Peter Cashin wrote: > The HTML5 spec says that an ambiguous ampersand (e.g. &something; undefined) is not allowed in element content Right, that's an authoring requirement. > and in section on HTML parsing, that this should throw a parse error. There is no throwing of parse errors in the HTML spec. I assume you're looking at the "anything else" case of http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#consume-a-character-reference ? This says, for the case you're looking at: If no match can be made, then no characters are consumed, and nothing is returned. In this case, if the characters after the U+0026 AMPERSAND character (&) consist of a sequence of one or more alphanumeric ASCII characters followed by a U+003B SEMICOLON character (;), then this is a parse error. And if you follow the link to "parse error" it's http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parse-error and basically has to do with validators needing to report them and UAs being allowed (but not required) to stop parsing here if they really want. If they do NOT want to abort on the error (which is the common case, btw), the spec defines how they press on. And the way they press on is by returning nothing from the "consume a character reference" algorithm. What that does depends on the caller, but in the case you're talking about that's presumably http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#character-reference-in-data-state and what it will do if nothing is returned is emit the '&' and move on to the next character. So basically treats the '&' as not special in any way in this case, leading to the behavior you observe in browsers. > Is the specification intended to have compliant HTML agents stop parsing ambiguous ampersands? Compliant HTML agents are allowed to do so, I guess, per the technical rules about parse errors, just like for any other parse error. But I expect that this is at least partly for conformance classes other than "browsers"; all browsers press on through parse errors in HTML. Maybe the allowed behavior for parse errors should be made conditional on conformance class... -Boris
Received on Tuesday, 10 December 2013 17:45:44 UTC