- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 2 Dec 2008 03:06:15 +0000 (UTC)
On Thu, 6 Nov 2008, Tommy Thorsen wrote: > > Before I get to the real issue, I think I should give you a little bit > of background. I'm working for a company which makes a web browser. > We've been having some problems with our algorithm for parsing illegal > html, so we decided to scrap the whole module and implement the > algorithm exactly as outlined in the html5 spec. So far this has been a > great success. We're already way better than we used to be, but there > are some situations where the html5 parsing algorithm does not quite > give us the result we expected. This is great feedback! > Yesterday I noticed that we were not displaying the site > http://bankrate.com correctly. The problem we had on that page boils > down to the following markup: > > <div id="firstdiv"> > A > <div id="seconddiv"> > <form id="firstform"> > <div id="thirddiv"> > <form id="secondform"></form> > </div> > </form> > </div> > B > </div> > > I'll walk you through it; Everything is normal until we reach the start > tag for the "secondform". It is ignored, since we're already in a form > (the form element pointer points to "firstform".) Then we see the end > tag which was meant for "secondform". We pop elements from the stack of > open elements until we find a form element (which is "firstform") > popping off "thirddiv" in the process. The next token we get is the end > div tag which was meant for "thirddiv". Since "thirddiv" is already > gone, we pop "seconddiv" instead, and now we're sort of off-balance. The > result is that A and B does not end up as children of the same div. > > I've applied a fix to our code which makes us handle this particular > case better. I haven't tested it very thoroughly, but the change is to > implement the 'An end tag whose tag name is "form"' section in "in body" > as if it said: > > ------ > An end tag whose tag name is "form" > > Let /node/ be the form element pointer > Set the form element pointer to null. > > If the stack of open elements does not have an element in scope with the > same tag name as that of the token, then this is a parse error; ignore the > token. > > Otherwise, run these steps: > > 1. Generate implied end tags. > 2. If the current node is not an element with the same tag name as that > of the token, then this is a parse error. > 3. Remove /node/ from the stack of open elements > ------ > > This seems to give us pretty much the same behaviour as Opera for the > simple example above. Can any of you see any potential problems with > this approach? In any case, I do believe that the specification needs to > be changed one way or another, so that it handles this case better. I concurr that this is closer to what we need. I have updated the spec accordingly. > I think I have a couple of other instances where we've had to deviate > from the specification in order to tackle problems discovered by our > testers, and if any of you are interested in this kind of feedback, I'll > dig them out and post them on this list. Yes, please do! -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 1 December 2008 19:06:15 UTC