Re: [css3-syntax] The "transform function whitespace" flag eats too much whitespace from Tab Atkins Jr. on 2013-01-20 (www-style@w3.org from January 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Sun, 20 Jan 2013 11:30:15 -0800
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: "www-style@w3.org" <www-style@w3.org>
Message-ID: <CAAWBYDAaMNmyjWcA1z7F3EgRh+nK0eWWJWY1trWs5qfVEiqXag@mail.gmail.com>

On Sun, Jan 20, 2013 at 12:22 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> Hi,
>
> The /transform function whitespace/ flag changes the  tokenizer so that
> `name (` is a single FUNCTION token instead of IDENT WS (.
>
> With `foo bar` however, the current ED’s state machine gives IDENT IDENT
> while it should give IDENT WS IDENT.
>
>
>> 3.3.14. Transform-function-whitespace state
>>
>> Consume the next input character.
>>
>> whitespace
>>     Remain in this state.
>> U+0028 LEFT PARENTHESIS (()
>>     Emit a function token with its value set to the identifer token's
>> value. Switch to the data state.
>> anything else
>>     Emit the ident token. Switch to the data state. Reconsume the current
>> input character.
>
>
> In the "anything else" case, the current input character (`b` in the `foo
> bar` example) is correctly reconsumed. But at this point all the whitespace
> is already consumed, so a WS token will be missing.
>
> Possible fixes:
>
> * Go back/reconsume one more character (which will be a whitespace
> character)
> * Emit a WS token after the ident.

Fixed it in a slightly different way, since neither of those options
are allowed in the self-imposed rules I've set on the tokenizer (you
can only reconsume the current character, you can only emit one token
before returning to the data state).

Instead, I run the entire state in look-ahead mode, so I can see when
I'm about to hit something that's not a parenthesis, emit the pending
ident token, and still have the whitespace in the current input
character so it can be reconsumed by the data state.

~TJ

Received on Sunday, 20 January 2013 19:31:02 UTC