- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Wed, 10 Feb 2010 00:26:01 -0500
On 2/9/10 11:56 PM, Tab Atkins Jr. wrote: > On Tue, Feb 9, 2010 at 9:05 PM, Biju<bijumaillist at gmail.com> wrote: >> What should a user agent display when html content is... >> >> <html><body> >> <%@ page language="java" %> >> </body></html> >> >> At present IE and Safari display blank >> >> Firefox display<%@ page language="java" %> As does Opera, and Firefox with the HTML5 parser enabled. >> But for >> <html><body> >> abc<? echo ">" ?> xyz >> </body></html> >> >> Firefox display... >> abc " ?> xyz As does Opera, and Firefox with the HTML5 parser enabled. > Can someone else with more familiarity with the parser algorithm help > out here? For the "<%@" case, it looks like the state machine will go through the following states: Data state -> Tag open state [1]. When encountering a '%' in the "Tag open" state, the specification says: Parse error. Emit a U+003C LESS-THAN SIGN character token and reconsume the current input character in the data state.[2] So the state will then remain "Data state" until the next '&' or '<' or EOF is seen, so the entire string up to the </body> will be treated as literal text. For the "<?" case, the state transitions will be: Data state -> Tag open state -> Bogus comment state [1],[2]. Then the specification says to: Consume every character up to and including the first U+003E GREATER-THAN SIGN character (>) or the end of the file (EOF), whichever comes first. Emit a comment token whose data is the concatenation of all the characters starting from and including the character that caused the state machine to switch into the bogus comment state, up to and including the character immediately before the last consumed character (i.e. up to the character just before the U+003E or EOF character). (If the comment was started by the end of the file (EOF), the token is empty.) Switch to the data state. [3] Or in other words, stop the bogus comment at the first '>' you see and then start parsing normally again. In this case, that means treating everything up to the next '<' or '&' or EOF as literal text. So the currently-specified behavior in fact matches the observed Firefox behavior (with either parser) on these simple testcases. -Boris [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#data-state [2] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state [3] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#bogus-comment-state
Received on Tuesday, 9 February 2010 21:26:01 UTC