RE: Working on IE's Operation Aborted problems...

Thanks a bunch. That's much more concise than the 20-or-so pages of rules. Would make a lovely overview :)

-----Original Message-----
From: Ian Hickson [mailto:ian@hixie.ch]
Sent: Friday, May 23, 2008 2:04 PM
To: Travis Leithead
Cc: public-html@w3.org; Harley Rosnow; Chris Wilson
Subject: Re: Working on IE's Operation Aborted problems...

On Fri, 23 May 2008, Travis Leithead wrote:
>
> You may recall I posted about Operation Aborted last month [1], and I'm
> now looking at providing what we're calling the "full fix" for the
> problem. The trouble is, given our current architecture, we're just a
> little puzzled how to correctly implement the change. It seems quite
> straightforward at first thought, but if you consider the "delete" cases
> (removeChild on an ancestor), and the "replace" cases (outerHTML on an
> ancestor), we end up with a bit of a tricky situation.
>
> In our quest to find the "right" way to handle this error condition, I
> did a relatively quick deep-dive into section 8.2 of HTML5 to try to
> find if these cases are spec'd somewhere therein (I believe they should
> be), but came up empty-handed. Perhaps I was looking at an old copy of
> the spec (had it printed at the cost of a small forest--probably a bad
> idea), or I simply missed something obvious.

The way the spec stands, the parser model basically never looks at the DOM
when parsing. Instead, it keeps a separate stack of open elements. Thus,
for example, if the parser sees this:

  <html><body><div><p><span>

...then at that point in the parsing the DOM will look like a tree as you
would expect, but in addition, the parser has a stack which looks like:

   html, body, div, p, span

...where each entry points at the elements that were created for each tag.

Now if at this point the parser parsers a <script> that futzes with the
DOM (I have omitted the script for brevity):

  <html><body><div><p><span><script>...</script>

The DOM might turn into something like:

   #document
     |
     +- span                            p (orphan)
         |
         +- html
         |   |
         |   +- blink
         |
         +- body

...with the <p> (and <script>) taken out altogether. However, the stack
still looks like:

   html, body, div, p, span

...and so when the parser continues and finds an <em> element:

  <html><body><div><p><span><script>...</script><em>

...it just appends it to the element that's the current element on the
stack, in this case the "span":

   html, body, div, p, span
                        ^ current element

...and thus the DOM would change into:

   #document
     |
     +- span                            p (orphan)
         |
         +- html
         |   |
         |   +- blink
         |
         +- body
         |
         +- em

Now the stack looks like:

   html, body, div, p, span, em
                             ^ current element

Now, if an </em> tag is seen, then the <em> element is popped from the
stack:

  <html><body><div><p><span><script>...</script><em></em>

   html, body, div, p, span
                        ^ current element

And if a </span> element is seen, the <span> is popped off:


  <html><body><div><p><span><script>...</script><em></em></span>

   html, body, div, p
                    ^ current element

Now the <p> element is the current element (bottom-most on the stack), but
as it isn't in the DOM the elements won't be visible. To show you what I
mean let's add some more tags:

  <html><body><div><p><span><script>...</script><em></em></span><a><b>

After parsing these the stack will have the <b> and <b> elements:

   html, body, div, p, a, b
                          ^ current element

...but those elements won't be in the document, they'll be added to the
orphaned <p> element:

   #document
     |
     +- span                            p (orphan)
         |                              |
         +- html                        +- a
         |   |                             |
         |   +- blink                      +- b
         |
         +- body
         |
         +- em

Does this make sense? Please let me know if there's anything about this
which is confusing or if you'd like a more detailed walkthrough of the
algorithm for some particular input.


(The above description is a little simplified -- to handle formatting
elements that are closed in the wrong order, the DOM is manipulated, and
there is a separate list of formatting elements to do some of the book-
keeping. However that doesn't really affect the issue here.)

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 23 May 2008 22:03:22 UTC