[whatwg] parsing nested forms

On Thu, 6 Nov 2008, Tommy Thorsen wrote:
> 
> Before I get to the real issue, I think I should give you a little bit 
> of background. I'm working for a company which makes a web browser. 
> We've been having some problems with our algorithm for parsing illegal 
> html, so we decided to scrap the whole module and implement the 
> algorithm exactly as outlined in the html5 spec. So far this has been a 
> great success. We're already way better than we used to be, but there 
> are some situations where the html5 parsing algorithm does not quite 
> give us the result we expected.

This is great feedback!


> Yesterday I noticed that we were not displaying the site 
> http://bankrate.com correctly. The problem we had on that page boils 
> down to the following markup:
> 
> <div id="firstdiv">
>    A
>    <div id="seconddiv">
>        <form id="firstform">
>            <div id="thirddiv">
>                <form id="secondform"></form>
>            </div>
>        </form>
>    </div>
>    B
> </div>
> 
> I'll walk you through it; Everything is normal until we reach the start 
> tag for the "secondform". It is ignored, since we're already in a form 
> (the form element pointer points to "firstform".) Then we see the end 
> tag which was meant for "secondform". We pop elements from the stack of 
> open elements until we find a form element (which is "firstform") 
> popping off "thirddiv" in the process. The next token we get is the end 
> div tag which was meant for "thirddiv". Since "thirddiv" is already 
> gone, we pop "seconddiv" instead, and now we're sort of off-balance. The 
> result is that A and B does not end up as children of the same div.
> 
> I've applied a fix to our code which makes us handle this particular 
> case better. I haven't tested it very thoroughly, but the change is to 
> implement the 'An end tag whose tag name is "form"' section in "in body" 
> as if it said:
> 
> ------
> An end tag whose tag name is "form"
> 
>    Let /node/ be the form element pointer
>    Set the form element pointer to null.
> 
>    If the stack of open elements does not have an element in scope with the
> same tag name as that of the token, then this is a parse error; ignore the
> token.
> 
>    Otherwise, run these steps:
> 
>       1. Generate implied end tags.
>       2. If the current node is not an element with the same tag name as that
> of the token, then this is a parse error.
>       3. Remove /node/ from the stack of open elements
> ------
> 
> This seems to give us pretty much the same behaviour as Opera for the 
> simple example above. Can any of you see any potential problems with 
> this approach? In any case, I do believe that the specification needs to 
> be changed one way or another, so that it handles this case better.

I concurr that this is closer to what we need. I have updated the spec 
accordingly.


> I think I have a couple of other instances where we've had to deviate 
> from the specification in order to tackle problems discovered by our 
> testers, and if any of you are interested in this kind of feedback, I'll 
> dig them out and post them on this list.

Yes, please do!

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 1 December 2008 19:06:15 UTC