- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 23 May 2008 21:04:28 +0000 (UTC)
- To: Travis Leithead <travil@windows.microsoft.com>
- Cc: "public-html@w3.org" <public-html@w3.org>, Harley Rosnow <Harley.Rosnow@microsoft.com>, Chris Wilson <Chris.Wilson@microsoft.com>
On Fri, 23 May 2008, Travis Leithead wrote: > > You may recall I posted about Operation Aborted last month [1], and I'm > now looking at providing what we're calling the "full fix" for the > problem. The trouble is, given our current architecture, we're just a > little puzzled how to correctly implement the change. It seems quite > straightforward at first thought, but if you consider the "delete" cases > (removeChild on an ancestor), and the "replace" cases (outerHTML on an > ancestor), we end up with a bit of a tricky situation. > > In our quest to find the "right" way to handle this error condition, I > did a relatively quick deep-dive into section 8.2 of HTML5 to try to > find if these cases are spec'd somewhere therein (I believe they should > be), but came up empty-handed. Perhaps I was looking at an old copy of > the spec (had it printed at the cost of a small forest--probably a bad > idea), or I simply missed something obvious. The way the spec stands, the parser model basically never looks at the DOM when parsing. Instead, it keeps a separate stack of open elements. Thus, for example, if the parser sees this: <html><body><div><p><span> ...then at that point in the parsing the DOM will look like a tree as you would expect, but in addition, the parser has a stack which looks like: html, body, div, p, span ...where each entry points at the elements that were created for each tag. Now if at this point the parser parsers a <script> that futzes with the DOM (I have omitted the script for brevity): <html><body><div><p><span><script>...</script> The DOM might turn into something like: #document | +- span p (orphan) | +- html | | | +- blink | +- body ...with the <p> (and <script>) taken out altogether. However, the stack still looks like: html, body, div, p, span ...and so when the parser continues and finds an <em> element: <html><body><div><p><span><script>...</script><em> ...it just appends it to the element that's the current element on the stack, in this case the "span": html, body, div, p, span ^ current element ...and thus the DOM would change into: #document | +- span p (orphan) | +- html | | | +- blink | +- body | +- em Now the stack looks like: html, body, div, p, span, em ^ current element Now, if an </em> tag is seen, then the <em> element is popped from the stack: <html><body><div><p><span><script>...</script><em></em> html, body, div, p, span ^ current element And if a </span> element is seen, the <span> is popped off: <html><body><div><p><span><script>...</script><em></em></span> html, body, div, p ^ current element Now the <p> element is the current element (bottom-most on the stack), but as it isn't in the DOM the elements won't be visible. To show you what I mean let's add some more tags: <html><body><div><p><span><script>...</script><em></em></span><a><b> After parsing these the stack will have the <b> and <b> elements: html, body, div, p, a, b ^ current element ...but those elements won't be in the document, they'll be added to the orphaned <p> element: #document | +- span p (orphan) | | +- html +- a | | | | +- blink +- b | +- body | +- em Does this make sense? Please let me know if there's anything about this which is confusing or if you'd like a more detailed walkthrough of the algorithm for some particular input. (The above description is a little simplified -- to handle formatting elements that are closed in the wrong order, the DOM is manipulated, and there is a separate list of formatting elements to do some of the book- keeping. However that doesn't really affect the issue here.) -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 23 May 2008 21:05:13 UTC