[whatwg] <!DOCTYPE html><body><table><math><mi>foo</mi></math></table> from Adam Barth on 2011-12-13 (public-whatwg-archive@w3.org from December 2011)

From: Adam Barth <w3c@adambarth.com>
Date: Mon, 12 Dec 2011 18:23:23 -0800
Message-ID: <CAJE5ia_k281dOfxEoHDDv2FnV=RtPwNXzcWcv6v-duYAFi+Beg@mail.gmail.com>

I'm trying to understand how the HTML parsing spec handles the following case:

<!DOCTYPE html><body><table><math><mi>foo</mi></math></table>

According to the html5lib test data, we should parse that as follows:

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <math math>
|       <math mi>
|         "foo"
|     <table>

However, I'm not sure whether that's what the spec actually does.

Consider point at which we parse the "f" character token (from "foo").
 The insertion mode will be "in table".  The spec will execute as
follows:

-> If the current node is a MathML text integration point and the
token is a character token
  * Process the token according to the rules given in the section
corresponding to the current insertion mode in HTML content.

-> A character token
  * Let the pending table character tokens be an empty list of tokens.
  * Let the original insertion mode be the current insertion mode.
  * Switch the insertion mode to "in table text" and reprocess the token.

-> Any other character token
  * Append the character token to the pending table character tokens list.

... the "o" and "o" will be processed similarly and end up in the
pending table character tokens list.

Now, consider the </mi> token.  We're still at a MathML text
integration point, but the current token is neither a start token
(with certain names) nor a character token, so we process the token
according to the rules given in the section for parsing tokens in
foreign content.

-> Any other end tag
  * Run these steps:
    ...

The net result of which is popping the stack of open elements, but not
flushing out the pending table character tokens list.  The list will
eventually be flushed when we process the </table> token, resulting
these character tokens getting foster parented:

| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <math math>
|       <math mi>
|     "foo"
|     <table>

Thanks,
Adam

Received on Monday, 12 December 2011 18:23:23 UTC