- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 23 May 2008 21:04:28 +0000 (UTC)
- To: Travis Leithead <travil@windows.microsoft.com>
- Cc: "public-html@w3.org" <public-html@w3.org>, Harley Rosnow <Harley.Rosnow@microsoft.com>, Chris Wilson <Chris.Wilson@microsoft.com>
On Fri, 23 May 2008, Travis Leithead wrote:
>
> You may recall I posted about Operation Aborted last month [1], and I'm
> now looking at providing what we're calling the "full fix" for the
> problem. The trouble is, given our current architecture, we're just a
> little puzzled how to correctly implement the change. It seems quite
> straightforward at first thought, but if you consider the "delete" cases
> (removeChild on an ancestor), and the "replace" cases (outerHTML on an
> ancestor), we end up with a bit of a tricky situation.
>
> In our quest to find the "right" way to handle this error condition, I
> did a relatively quick deep-dive into section 8.2 of HTML5 to try to
> find if these cases are spec'd somewhere therein (I believe they should
> be), but came up empty-handed. Perhaps I was looking at an old copy of
> the spec (had it printed at the cost of a small forest--probably a bad
> idea), or I simply missed something obvious.
The way the spec stands, the parser model basically never looks at the DOM
when parsing. Instead, it keeps a separate stack of open elements. Thus,
for example, if the parser sees this:
<html><body><div><p><span>
...then at that point in the parsing the DOM will look like a tree as you
would expect, but in addition, the parser has a stack which looks like:
html, body, div, p, span
...where each entry points at the elements that were created for each tag.
Now if at this point the parser parsers a <script> that futzes with the
DOM (I have omitted the script for brevity):
<html><body><div><p><span><script>...</script>
The DOM might turn into something like:
#document
|
+- span p (orphan)
|
+- html
| |
| +- blink
|
+- body
...with the <p> (and <script>) taken out altogether. However, the stack
still looks like:
html, body, div, p, span
...and so when the parser continues and finds an <em> element:
<html><body><div><p><span><script>...</script><em>
...it just appends it to the element that's the current element on the
stack, in this case the "span":
html, body, div, p, span
^ current element
...and thus the DOM would change into:
#document
|
+- span p (orphan)
|
+- html
| |
| +- blink
|
+- body
|
+- em
Now the stack looks like:
html, body, div, p, span, em
^ current element
Now, if an </em> tag is seen, then the <em> element is popped from the
stack:
<html><body><div><p><span><script>...</script><em></em>
html, body, div, p, span
^ current element
And if a </span> element is seen, the <span> is popped off:
<html><body><div><p><span><script>...</script><em></em></span>
html, body, div, p
^ current element
Now the <p> element is the current element (bottom-most on the stack), but
as it isn't in the DOM the elements won't be visible. To show you what I
mean let's add some more tags:
<html><body><div><p><span><script>...</script><em></em></span><a><b>
After parsing these the stack will have the <b> and <b> elements:
html, body, div, p, a, b
^ current element
...but those elements won't be in the document, they'll be added to the
orphaned <p> element:
#document
|
+- span p (orphan)
| |
+- html +- a
| | |
| +- blink +- b
|
+- body
|
+- em
Does this make sense? Please let me know if there's anything about this
which is confusing or if you'd like a more detailed walkthrough of the
algorithm for some particular input.
(The above description is a little simplified -- to handle formatting
elements that are closed in the wrong order, the DOM is manipulated, and
there is a separate list of formatting elements to do some of the book-
keeping. However that doesn't really affect the issue here.)
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 23 May 2008 21:05:13 UTC