W3C home > Mailing lists > Public > whatwg@whatwg.org > February 2013

[whatwg] HTML parsing, the stack of open elements, and foreign content

From: Adam Klein <adamk@chromium.org>
Date: Wed, 27 Feb 2013 12:39:04 -0800
Message-ID: <CAEvLGcLVj+KfQ04kOCysdMBYL5fdMh2xJ7-qcLFdrsbsshnkfQ@mail.gmail.com>
To: whatwg@whatwg.org
Cc: me@gsnedders.com, Ian Hickson <ian@hixie.ch>
Consider the following script:

tr = document.createElement('tr')
tr.innerHTML = '<math><tr><mo><td>';

That is, the fragment is parsed with tr as the context element. What
should the generated DOM be? Note that <mo> is a "MathML text
integration point", which causes the <td> to be processed not as
foreign content but as a normal HTML token. This leads to the
following DOM in WebKit:

<tr>
    <math math>
        <math tr>
            <math mo>
    <td>

(the "math" prefixes denote that these are elements with the MathML
namespace.) In Gecko, I instead get:

<tr>
    <math math>
        <math tr>
            <math mo>
            <td>

Note that the <td> in both cases is an HTML element, even though in
Gecko it's in a MathML tree.

The spec for what should happen to that <td> is the first step of
http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-intr

This case clearly seems like a bug in Gecko: it's treating the <math
tr> as if it's an HTML <tr>. That is, it's comparing only the local
name (or "tag name" as the spec usually refers to it).

But this same ambiguity exists elsewhere in the spec. For example, the
very next item under "in row" says "If the stack of open elements does
not have an element in table scope with the same tag name as the
token" (in this case, it's looking for a <tr>).

I think the HTML parser ought to specify more precisely how to deal
with namespaces in the stack of open elements, given that that stack
can contain elements of varying namespaces.

- Adam
Received on Wednesday, 27 February 2013 20:39:30 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:20 UTC