[whatwg] Reconstructing formatting elements (8.2.5)

On Fri, 27 Feb 2009, Kartikaya Gupta wrote:
>
> I have a question about how formatting elements are reconstructed when 
> dealing with tainted tables. Specifically, the fine folks running 
> westjet.com stuck some malformed HTML on their site that I've boiled 
> down to the following snippet:
> 
> <table>
>  <tr>
>   <a href="foo"><td></a></td>
>   <td> </td>
>  </tr>
> </table>
> 
> When I parse this using the validator.nu HTML5 parser implementation, 
> the <a> tag gets put into the list of formatting elements. All the bits 
> of whitespace that come later trigger a reconstruction of the active 
> formatting elements, so the <a> gets cloned a bunch of times. The 
> resulting DOM ends up like so:
> 
> <HTML><HEAD></HEAD><BODY><A href="foo"></A><A href="foo">
>   </A><A href="foo">
>  </A><A href="foo">
> </A><TABLE>
>  <TBODY><TR>
>   <TD></TD><TD> </TD></TR></TBODY></TABLE><A href="foo">
> </A></BODY></HTML>
> 
> This seems to be correct behavior according to what is specced in HTML5. 
> However, none of the major browsers clone the <a> tag at all. [1]

In my testing, IE did in fact put text into that link. It doesn't clone 
the tree, but it does its equivalent, which is to say it generate a 
non-tree DOM that is equivalent to the cloned nodes above.

The behavior HTML5 requires is thus intentional for compat with IE.

We could avoid cloning quite as many by eating whitespace after a 
table-related tag (<tr>, <td>, etc) by resetting the table taint flag at 
those points... would that be desireable?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 27 April 2009 18:15:30 UTC