- From: Adam Barth <w3c@adambarth.com>
- Date: Wed, 14 Dec 2011 10:58:31 -0800
On Tue, Dec 13, 2011 at 2:32 PM, Ian Hickson <ian at hixie.ch> wrote: > On Mon, 12 Dec 2011, Adam Barth wrote: >> I'm trying to understand how the HTML parsing spec handles the following case: >> >> <!DOCTYPE html><body><table><math><mi>foo</mi></math></table> >> >> According to the html5lib test data, we should parse that as follows: >> >> | <!DOCTYPE html> >> | <html> >> | ? <head> >> | ? <body> >> | ? ? <math math> >> | ? ? ? <math mi> >> | ? ? ? ? "foo" >> | ? ? <table> >> >> However, I'm not sure whether that's what the spec actually does. >> >> Consider point at which we parse the "f" character token (from "foo"). >> ?The insertion mode will be "in table". ?The spec will execute as >> follows: >> >> -> If the current node is a MathML text integration point and the >> token is a character token >> ? * Process the token according to the rules given in the section >> corresponding to the current insertion mode in HTML content. >> >> -> A character token >> ? * Let the pending table character tokens be an empty list of tokens. >> ? * Let the original insertion mode be the current insertion mode. >> ? * Switch the insertion mode to "in table text" and reprocess the token. >> >> -> Any other character token >> ? * Append the character token to the pending table character tokens list. >> >> ... the "o" and "o" will be processed similarly and end up in the >> pending table character tokens list. >> >> Now, consider the </mi> token. ?We're still at a MathML text >> integration point, but the current token is neither a start token >> (with certain names) nor a character token, so we process the token >> according to the rules given in the section for parsing tokens in >> foreign content. >> >> -> Any other end tag >> ? * Run these steps: >> ? ? ... >> >> The net result of which is popping the stack of open elements, but not >> flushing out the pending table character tokens list. ?The list will >> eventually be flushed when we process the </table> token, resulting >> these character tokens getting foster parented: >> >> | <!DOCTYPE html> >> | <html> >> | ? <head> >> | ? <body> >> | ? ? <math math> >> | ? ? ? <math mi> >> | ? ? "foo" >> | ? ? <table> > > On Tue, 18 Oct 2011, David Flanagan wrote: >> >> Here's my current workaround: >> >> In 13.2.5, in the rules for whether to use the current insertion mode or >> to insert the token as foreign content, if the token is being inserted >> because the current node is a math (or HTML, but I'm not sure about >> that) integration point, then first set a text_integration_mode flag, >> then invoke the current insertion mode, then clear the flag. >> >> And in the in table insertion mode, when a character token is inserted, >> and the text_integration_mode flag is set, then just process the token >> using in body mode, and otherwise follow the directions that are there >> now. >> >> I'm not sure that is the best way to fix the spec, but it works for me, >> in the sense that my parser now passes the tests. > > I think the real problem is that there's no need to go into the "table > text" mode if the current node is not a table model element. So I've > changed the spec at that point. > > Please let me know if that doesn't fix the test case or causes any other > regressions. That fix seems to work great. Thanks! Adam
Received on Wednesday, 14 December 2011 10:58:31 UTC