How to parse </foo </bar> from Adam Barth on 2010-06-22 (public-html@w3.org from June 2010)

From: Adam Barth <w3c@adambarth.com>
Date: Tue, 22 Jun 2010 13:42:15 -0700
To: HTML WG <public-html@w3.org>
Cc: Henri Sivonen <hsivonen@iki.fi>, Eric Seidel <eric@webkit.org>, Alexey Proskuryakov <ap@webkit.org>
Message-ID: <AANLkTikpgpKApuuwkbcgwjtenFACUhMPyI8Y1u5oKJI_@mail.gmail.com>

WebKit received a bug report [1] about a layout problem on
http://www.macruby.org/ due to the HTML5 parsing algorithm.  (You can
visit the site in a Firefox or WebKit nightly build to see the issue.)
 The trouble boils down to this reduction:

Should say PASS:
<div>
  <div style="visibility:hidden">
    <p></p
  </div>
  PASS
</div>

Essentially, the missing ">" on the close tag of the <p> element
causes the tokenizer to consume the </div> characters as well,
resulting in the wrong DOM.  According to my tests, both the legacy
WebKit parser and the legacy Firefox parser terminate a tag token upon
encountering a "<" character.  The HTML5 spec recognizes that case as
a parse error, but has different error recovery.  (This issue is on
our "top five" list of behavioral differences likely to cause
compatibility problems.)

Is there a particular reason why we don't terminate start and end tag
tokens upon encountering a "<" character?

Adam

[1] https://bugs.webkit.org/show_bug.cgi?id=40961

Received on Tuesday, 22 June 2010 20:43:07 UTC