- From: Adam Klein <adamk@chromium.org>
- Date: Tue, 2 Apr 2013 12:19:08 -0700
- To: Rafael Weinstein <rafaelw@google.com>
- Cc: William Chen <wchen@mozilla.com>, WHATWG List <whatwg@whatwg.org>, Ian Hickson <ian@hixie.ch>, me@gsnedders.com, Henri Sivonen <hsivonen@gmail.com>
Since I haven't heard any talk on this thread (or on the w3.org bug), I've landed a patch in WebKit to treat tokens being processed in HTML as if they had an HTML namespace (http://trac.webkit.org/r147441). My reason for landing was that we've already seen two crash bugs due to the WebKit parser getting into a bad state WRT the stack of open elements, and I'd rather not leave us open to more of the same. Note that this change passes all existing html5lib tests. I added one test case, which came (slightly modified) from Rafael's bug: <body><table><tr><td><svg><td><foreignObject><span></td>Foo which is now parsed as: | <html> | <head> | <body> | "Foo" | <table> | <tbody> | <tr> | <td> | <svg svg> | <svg td> | <svg foreignObject> | <span> where previously (and in current Firefox) it's parsed as: | <html> | <head> | <body> | <table> | <tbody> | <tr> | <td> | <svg svg> | <svg td> | <svg foreignObject> | <span> | "Foo" That is, the </td> is being parsed as HTML (thanks to <foreignObject><span>), so it searches on the stack for an HTML td to close. There are probably a whole set of similar test cases, but they can be tricky to construct thanks, in part, to the various "escape hatches" from an HTML integration point (including <p>, <table>, and many more). I think the equivalent spec change would be to spell out in detail what it means for a token or element to match something on the stack of open elements. The new WebKit behavior seems more proper to me (and seemed reasonable to those I could raise on #whatwg a few days ago); I also think it's unlikely to affect much real content, so changing it to make the parser's internal state more sane is worthwhile. - Adam On Fri, Mar 15, 2013 at 10:31 AM, Rafael Weinstein <rafaelw@google.com> wrote: > I just opened another similar bug: > https://www.w3.org/Bugs/Public/show_bug.cgi?id=21292 which has a > similar root cause. > > I agree with Adam that it seems wrong that the stack of open elements > can contain elements in disparate namespaces, but its operation (at > times) only examines the local name (e.g. checking if an element is in > a specific scope, popping elements from the stack of open elements > until an element with the same tag name...) > > On Wed, Feb 27, 2013 at 12:39 PM, Adam Klein <adamk@chromium.org> wrote: >> Consider the following script: >> >> tr = document.createElement('tr') >> tr.innerHTML = '<math><tr><mo><td>'; >> >> That is, the fragment is parsed with tr as the context element. What >> should the generated DOM be? Note that <mo> is a "MathML text >> integration point", which causes the <td> to be processed not as >> foreign content but as a normal HTML token. This leads to the >> following DOM in WebKit: >> >> <tr> >> <math math> >> <math tr> >> <math mo> >> <td> >> >> (the "math" prefixes denote that these are elements with the MathML >> namespace.) In Gecko, I instead get: >> >> <tr> >> <math math> >> <math tr> >> <math mo> >> <td> >> >> Note that the <td> in both cases is an HTML element, even though in >> Gecko it's in a MathML tree. >> >> The spec for what should happen to that <td> is the first step of >> http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-intr >> >> This case clearly seems like a bug in Gecko: it's treating the <math >> tr> as if it's an HTML <tr>. That is, it's comparing only the local >> name (or "tag name" as the spec usually refers to it). >> >> But this same ambiguity exists elsewhere in the spec. For example, the >> very next item under "in row" says "If the stack of open elements does >> not have an element in table scope with the same tag name as the >> token" (in this case, it's looking for a <tr>). >> >> I think the HTML parser ought to specify more precisely how to deal >> with namespaces in the stack of open elements, given that that stack >> can contain elements of varying namespaces. >> >> - Adam
Received on Tuesday, 2 April 2013 19:19:36 UTC