- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Sun, 20 Jan 2013 11:30:15 -0800
- To: Simon Sapin <simon.sapin@kozea.fr>
- Cc: "www-style@w3.org" <www-style@w3.org>
On Sun, Jan 20, 2013 at 12:22 AM, Simon Sapin <simon.sapin@kozea.fr> wrote: > Hi, > > The /transform function whitespace/ flag changes the tokenizer so that > `name (` is a single FUNCTION token instead of IDENT WS (. > > With `foo bar` however, the current ED’s state machine gives IDENT IDENT > while it should give IDENT WS IDENT. > > >> 3.3.14. Transform-function-whitespace state >> >> Consume the next input character. >> >> whitespace >> Remain in this state. >> U+0028 LEFT PARENTHESIS (() >> Emit a function token with its value set to the identifer token's >> value. Switch to the data state. >> anything else >> Emit the ident token. Switch to the data state. Reconsume the current >> input character. > > > In the "anything else" case, the current input character (`b` in the `foo > bar` example) is correctly reconsumed. But at this point all the whitespace > is already consumed, so a WS token will be missing. > > Possible fixes: > > * Go back/reconsume one more character (which will be a whitespace > character) > * Emit a WS token after the ident. Fixed it in a slightly different way, since neither of those options are allowed in the self-imposed rules I've set on the tokenizer (you can only reconsume the current character, you can only emit one token before returning to the data state). Instead, I run the entire state in look-ahead mode, so I can see when I'm about to hit something that's not a parenthesis, emit the pending ident token, and still have the whitespace in the current input character so it can be reconsumed by the data state. ~TJ
Received on Sunday, 20 January 2013 19:31:02 UTC