- From: Adam Barth <w3c@adambarth.com>
- Date: Sun, 8 Jun 2014 22:11:29 -0700
- To: Geoffrey Sneddon <foolistbar@googlemail.com>
- Cc: WHATWG <whatwg@whatwg.org>
In Blink's implementation, we actually use two additional tokenizer states for CDATA: CDATASectionRightSquareBracketState, CDATASectionDoubleRightSquareBracketState, Adam On Sun, Jun 8, 2014 at 6:24 PM, Geoffrey Sneddon <foolistbar@googlemail.com> wrote: > It would aid programmatic conversion of the spec, and confuse me when > reading the spec less thereby avoiding bugs like 25871, if these states > matched the model of the rest of the tokenizer. > > Thus I propose the bogus comment state becomes: > >> Consume the next input character: >> >> U+003E GREATER-THAN SIGN (>): >> >> Switch to the data state. Emit the comment token. >> >> U+0000 NULL: >> >> Append a U+FFFD REPLACEMENT CHARACTER character to the comment token's data. >> >> EOF: >> >> Switch to the data state. Emit the comment token. Reconsume the EOF character. >> >> Anything else: >> >> Append the current input character to the comment token's data. > > This also necessitates creating a new comment token prior to entering > the bogus comment state. > > The CDATA section state should become: > >> Consume the next input character: >> >> U+005D RIGHT SQUARE BRACKET (]): >> >> If the three characters starting from the current input character are U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE BRACKET U+003E GREATER-THAN SIGN (]]>), then consume those characters and switch to the data state. Otherwise, emit the current input character as a character token. >> >> EOF: >> >> Switch to the data state. Reconsume the EOF character. >> >> Anything else: >> >> Append the current input character to the comment token's data. > > No changes are needed elsewhere for this. (There is no consistent style > for lookahead — and most cases are ASCII case-insensitive words — so I > went with what seems sane here!) > > /Geoffrey
Received on Monday, 9 June 2014 05:12:27 UTC