[whatwg] [WA1] Formatting elements

On Mon, 17 Jul 2006, Stewart Brodie wrote:
> 
> I tried dry-running the algorithm for handling mis-nested formatting 
> elements, but I ended up with a tree that looked very odd.  I can't 
> believe that the output I ended up with is what the desired result of 
> the algorithm is, so there is a mistake somewhere: either in my 
> execution of the algorithm or in the algorithm itself.  I took the 
> following fragment of HTML:
> 
> <DIV> abc <B> def <I> ghi <P> jkl </B> mno </I> pqr </P> stu

With that as input, my implementation outputs:

   5: Parse error: missing document type declaration.
   38: Parse error: mismatched b element end tag (misnested tags).
   47: Parse error: mismatched i element end tag (misnested tags).
   57: Parse error: mismatched body element end tag (premature end of 
   file?).
   <html><head></head><body><div> abc <b> def <i> ghi 
   </i></b><i></i><p><i><b> jkl </b> mno </i> pqr </p> 
   stu</div></body></html>


> One filled whiteboard later [...]

Yeah, I have a bunch of whiteboards full of this stuff too. ;-)


> the result I ended up with was equivalent to:
> 
> <DIV> abc <B> def <I> ghi </I> </B> <I> </I> <P> <I> <B> jkl </B> mno </I>
> pqr </P> stu </DIV>

Looks right.


> I know it's hard to see when written out textually, but note that for 
> the text node 'jkl', the I and B elements are the wrong way around!

Wrong way with respect to what? They're the "right way" if you look at the 
end tags: </b> closes first, so it must be innermost! ;-)

The point is this is error-correction logic, there is no "right way" 
(well, until the spec is a standard, I guess).


> It all seems to start going wrong for me in step 7 of the algorithm.  
> During the handling of the </B> tag, the clone of I gets created and 
> that's the node that ends up being the childless I node that has the DIV 
> as its parent (during step 5 of handling the </I> tag when the I is 
> cloned for a second time to be the child of the P and adopt the original 
> children of the P) Firefox generates what I think I would expect and 
> prefer:
> 
> <DIV> abc <B> def <I> ghi </I> </B> <P> <B> <I> jkl </I> </B> <I> mno </I>
> pqr </P> stu </DIV>

It's the same number of tags, in this case.

It gets more obviously bad to do what Mozilla does when you consider a 
case like:

   <b><p>...<p>...<p>...<p>...<p>...<p>...

...which is very common. With that exact markup, Safari, IE7, and the spec 
all end up with the exact same DOM tree (from the <body> down, at least), 
and with the same number of element nodes (from <body> down, 8).

Mozilla ends up with 13 nodes (from the body down). That doesn't scale -- 
there are pages with hundreds of nodes like this.


> For comparison, Internet Explorer 6 on the other hand treats the P no
> differently to the B or I and ends up with:  <DIV> abc <B> def <I> ghi <P>
> jkl </P> </I> </B> <I> <P> mno </P> </I> <P> pqr </P> stu </DIV>

Actually IE has only one P element (and only one B and only one I). Look 
closer and you'll find that the P element isn't closed -- it's just that 
the "mno" and "pqr" text nodes' parentNodes point to the P, while the DIV 
element's childNodes array actually also mentions those text nodes. Yes, 
IE generates DOM trees that aren't trees. See also:

   http://ln.hixie.ch/?start=1037910467&count=1
   http://ln.hixie.ch/?start=1138169545&count=1
   http://ln.hixie.ch/?start=1137740632&count=1
   http://ln.hixie.ch/?start=1026485588&count=1
   http://ln.hixie.ch/?start=1137799947&count=1


> The problem here may simply be that appending any node due to opening 
> any non-formatting/non-phrasing open tag when in "in body" should cause 
> any formatting/phrasing elements to be popped off the stack of open 
> elements, and then NOT execute "reconstruct the active formatting 
> elements" (because it'll be executed automatically when opening the next 
> formatting/phrasing element or text node anyway)

Isn't that already the case? You only reconstruct for inline elements and 
text nodes, as far as I can tell.


BTW while looking at this stuff this page may be of use:

   http://software.hixie.ch/utilities/js/live-dom-viewer/

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 18 July 2006 16:15:32 UTC