W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > July 2010

[Bug 10117] New: Tag name state algorithm has mis-ordered step

From: <bugzilla@jessica.w3.org>
Date: Fri, 09 Jul 2010 00:15:07 +0000
To: public-html-bugzilla@w3.org
Message-ID: <bug-10117-2486@http.www.w3.org/Bugs/Public/>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10117

           Summary: Tag name state algorithm has mis-ordered step
           Product: HTML WG
           Version: unspecified
          Platform: All
               URL: http://dev.w3.org/html5/spec/Overview.html#tag-name-st
                    ate
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: adrianba@microsoft.com
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html@w3.org


Change

  U+003E GREATER-THAN SIGN (>)
  Emit the current tag token. Switch to the data state.

to

  U+003E GREATER-THAN SIGN (>)
  Switch to the data state. Emit the current tag token.

----------------
Details of issue:

Section 8.2.4.10 (Tag name state) says

  U+003E GREATER-THAN SIGN (>)
  Emit the current tag token. Switch to the data state.

The "Emit the current tag token" step is defined in section 8.2.4 as:

  When a token is emitted, it must immediately be handled by the
  tree construction stage. The tree construction stage can affect
  the state of the tokenization stage, and can insert additional
  characters into the stream.

So let us consider the following HTML:

  <html>
  <head>
  <script><!-- window.alert(); --></script>
  </head>
  <body></body>
  </html>

At the closing '>' of '<script>', the tokenizer is in tag name state.  It emits
the current tag token, which is a 'script' start tag.

The tree construction stage, in section 8.2.5.7 ("in head" insertion mode),
specifies:

  A start tag whose tag name is "script"
  Run these steps:
  ...
  5.Switch the tokenizer to the script data state.

The tree construction stage therefore resets the tokenizer state immediately.

After completing, the tree construction stage returns to the tokenizer.  *And
at that point, the tokenizer is specified to reset to the data state!*  This
state update overwrites the state update from the tree construction stage, and
the script is not parsed as script.

The identical bug exists in all the other states that can emit start tags which
can contain content (8.2.4.34 through 8.2.4.37, and 8.2.4.42).

The fix is to reverse the order of the state update and the token emission:

  U+003E GREATER-THAN SIGN (>)
  Switch to the data state. Emit the current tag token.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 9 July 2010 00:15:09 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 16:30:52 UTC