Parsing: CDATA section processing

The current specification of how the tokeniser decides whether to process
CDATA sections or treat them as bogus comments relies upon the tokeniser
having access to the treebuilder's insertion mode and the current node in
the stack of open elements. I can't see any reason why there is a need for
this that couldn't be equally well served by a flag controlled by the
treebuilder (much like the existing content model flag). Therefore, I
propose the following:

Introduce a new tokeniser flag "processCDATASection", which is initially
clear.

Change the wording in the tokeniser's markup declaration open state from:

  Otherwise, if the insertion mode is "in foreign content" and the 
  current node is not one of the following:

    * An mi element in the MathML namespace.
    * An mo element in the MathML namespace.
    * An mn element in the MathML namespace.
    * An ms element in the MathML namespace.
    * An mtext element in the MathML namespace.
    * A foreignObject element in the SVG namespace.
    * A desc element in the SVG namespace.
    * A title element in the SVG namespace. 

to:

  Otherwise, if the processCDATASection flag is set

Change the last sentence of the 'A start tag whose name is "math"' and 'A
start tag whose name is "svg"' cases in the treebuilder's "in body"
insertion mode to read:

  Otherwise, let the secondary insertion mode be the current insertion 
  mode, then switch the insertion mode to "in foreign content" and set 
  the tokeniser's processCDATASection flag.

Change the wording in the "in foreign content" insertion mode to:

  + A start tag, if the current node is an mi element in the MathML
    namespace.
  + A start tag, if the current node is an mo element in the MathML
    namespace.
  + A start tag, if the current node is an mn element in the MathML
    namespace.
  + A start tag, if the current node is an ms element in the MathML
    namespace.
  + A start tag, if the current node is an mtext element in the MathML
    namespace.
  + A start tag, if the current node is a foreignObject element in the
    SVG namespace.
  + A start tag, if the current node is a desc element in the SVG
    namespace.
  + A start tag, if the current node is a title element in the SVG
    namespace.
  + A start tag, if the current node is an element in the HTML namespace.
  + A start tag whose tag name is "svg", if the current node is an
    annotation-xml element in the MathML namespace.
  + An end tag

    Process the token using the rules for the secondary insertion mode.

    If, after doing so, the insertion mode is still "in foreign content",
    but there is no element in scope that has a namespace other than the
    HTML namespace, switch the insertion mode to the secondary insertion
    mode.

    If, after doing so, the insertion mode is not "in foreign content",
    clear the tokeniser's processCDATASection flag.

    Otherwise, if the current node is an mi element in the MathML
    namespace or an mo element in the MathML namespace or an mn element
    in the MathML namespace or an ms element in the MathML namespace or
    an mtext element in the MathML namespace or a foreignObject element
    in the SVG namespace or a desc element in the SVG namespace or a
    title element in the SVG namespace, clear the tokeniser's
    processCDATASection flag.

    Otherwise, set the tokeniser's processCDATASection flag.

  + A start tag whose tag name is one of: the HTML element tag names

    Parse error.

    Pop elements from the stack of open elements until the current node
    is in the HTML namespace.

    Switch the insertion mode to the secondary insertion mode, clear the
    tokeniser's processCDATASection flag, and reprocess the token.

 + Any other start tag

    Apply case fixups, attribute namespace fixups.

    Insert a foreign element for the token, in the same namespace as the
    current node.

    If the token has its self-closing flag set, pop the current node off
    the stack of open elements and acknowledge the token's self-closing
    flag.

    If the current node is an mi element in the MathML namespace or an mo
    element in the MathML namespace or an mn element in the MathML
    namespace or an ms element in the MathML namespace or an mtext element
    in the MathML namespace or a foreignObject element in the SVG namespace
    or a desc element in the SVG namespace or a title element in the SVG
    namespace, clear the tokeniser's processCDATASection flag.

    Otherwise, set the tokeniser's processCDATASection flag.


John.

Received on Sunday, 6 April 2008 18:02:04 UTC