- From: Rafael Weinstein <rafaelw@google.com>
- Date: Thu, 30 May 2013 07:34:08 -0700
- To: Ian Hickson <ian@hixie.ch>
- Cc: WHATWG List <whatwg@whatwg.org>, William Chen <wchen@mozilla.com>, me@gsnedders.com, Adam Klein <adamk@chromium.org>, Henri Sivonen <hsivonen@gmail.com>
On Wed, May 29, 2013 at 3:19 PM, Ian Hickson <ian@hixie.ch> wrote: > On Wed, 27 Feb 2013, Adam Klein wrote: >> >> Consider the following script: >> >> tr = document.createElement('tr') >> tr.innerHTML = '<math><tr><mo><td>'; >> >> That is, the fragment is parsed with tr as the context element. What >> should the generated DOM be? > > Up to the <td> it's unambiguous and uncontroversial, I hope; and should > be: > > <html:tr> > <math:math> > <math:tr> > <math:mo> > > At the "<td>", you clear the stack back to a table row context, which pops > all the nodes from the stack except the root one (the <html> one, > representing the original <tr> element on which innerHTML was invoked). > > It thus results in: > > <html:tr> > <math:math> > <math:tr> > <math:mo> > <html:td> > > >> Note that <mo> is a "MathML text integration point", which causes the >> <td> to be processed not as foreign content but as a normal HTML token. >> This leads to the following DOM in WebKit: >> >> <tr> >> <math math> >> <math tr> >> <math mo> >> <td> >> >> (the "math" prefixes denote that these are elements with the MathML >> namespace.) > > That is correct. > > >> In Gecko, I instead get: >> >> <tr> >> <math math> >> <math tr> >> <math mo> >> <td> > > That is not. > > >> The spec for what should happen to that <td> is the first step of >> http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-intr >> >> This case clearly seems like a bug in Gecko: it's treating the <math tr> >> as if it's an HTML <tr>. That is, it's comparing only the local name (or >> "tag name" as the spec usually refers to it). > > Right, that's wrong. The spec isn't ambiguous here, it explicitly says > that the current node must be a <tr> or <html> element, not an element > with a "tr" or "html" tag name, and <tr> and <html> elements are in the > HTML namespace (they're even hyperlinked to their definitions). > > >> But this same ambiguity exists elsewhere in the spec. For example, the >> very next item under "in row" says "If the stack of open elements does >> not have an element in table scope with the same tag name as the token" >> (in this case, it's looking for a <tr>). > > Yeah, that text is wrong, because part of the rules look for <*:tr>, and > part assume that only <html:tr> was matched. In fact, it means that > tr.innerHTML = '<math><tr><mo></tr>' has no parse error and pops the root > <html> off the tree! That's clearly bogus. > > >> I think the HTML parser ought to specify more precisely how to deal with >> namespaces in the stack of open elements, given that that stack can >> contain elements of varying namespaces. > > It's not so much that it has to do it precisely (it does), it's that it > has to do it accurately... > > There's a huge number of places in the spec that do tag name comparisons > rather than element identity (tag+namespace) comparisons, and it's not at > all clear to me that they should all change. Consider: > > On Fri, 15 Mar 2013, Rafael Weinstein wrote: >> >> I just opened another similar bug: >> https://www.w3.org/Bugs/Public/show_bug.cgi?id=21292 which has a similar >> root cause. >> >> I agree with Adam that it seems wrong that the stack of open elements >> can contain elements in disparate namespaces, but its operation (at >> times) only examines the local name (e.g. checking if an element is in a >> specific scope, popping elements from the stack of open elements until >> an element with the same tag name...) > > Well, as noted in the bug, I don't think we should check the namespace in > _every_ case. The case in the bug is this: > > <body><table><tr><td><svg><td><foreignObject></td>Foo<foo> > > This is clearly invalid; the question is, what <td> did the author mean to > match, if any? It makes sense to me to match the most recently one. In Not that I care very much to attempt to support DWIM in this way, because I think allowing parser implementations to maintain a sane invariant here is more important, but... I think it's more likely the author was being lazy about closing all the svg tags and simply wanted a quick way to say "I'm done with my table cell" > particular, consider these variations: > > <body><table><tr><td><svg><zz><foreignObject></td>Foo<foo> > <body><table><tr><td><svg><zz><foreignObject></zz>Foo<foo> > <body><table><tr><zz><svg><zz><foreignObject></zz>Foo<foo> > > > > The cases in the spec now that are bogus are the cases where I mix one and > the other. That actually means the opposite kind of change as is being > proposed above: for example, it would mean changing the "table" end tag > steps from what they say now (popping an HTML <table> element), to popping > any "table" element regardless of namespace. This would make the algorithm > more consistent, and remove the bugs mentioned above. > > Is this what people want to do? It's not what you (Adam) implemented, as I > understand it. > > -- > Ian Hickson U+1047E )\._.,--....,'``. fL > http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 30 May 2013 14:34:43 UTC