How to parse </foo </bar>

WebKit received a bug report [1] about a layout problem on
http://www.macruby.org/ due to the HTML5 parsing algorithm.  (You can
visit the site in a Firefox or WebKit nightly build to see the issue.)
 The trouble boils down to this reduction:

Should say PASS:
<div>
  <div style="visibility:hidden">
    <p></p
  </div>
  PASS
</div>

Essentially, the missing ">" on the close tag of the <p> element
causes the tokenizer to consume the </div> characters as well,
resulting in the wrong DOM.  According to my tests, both the legacy
WebKit parser and the legacy Firefox parser terminate a tag token upon
encountering a "<" character.  The HTML5 spec recognizes that case as
a parse error, but has different error recovery.  (This issue is on
our "top five" list of behavioral differences likely to cause
compatibility problems.)

Is there a particular reason why we don't terminate start and end tag
tokens upon encountering a "<" character?

Adam

[1] https://bugs.webkit.org/show_bug.cgi?id=40961

Received on Tuesday, 22 June 2010 20:43:07 UTC