[css3-syntax] Hyphen+escape should be a valid ident token

Hi,

In the current ED if a token starts with U+002D HYPHEN-MINUS (-), it can 
only be an ident token if the next character is a name-start character. 
Note that unlike CSS 2.1’s nmstart tokenizer macro, css3-syntax’s 
definition of a name-start character does not include escapes.

For example, `-\x` should be a single IDENT but the current state 
machine tokenizes it as DELIM IDENT.


Proposed changes:

In 3.3.4. Data state, in the U+002D HYPHEN-MINUS (-) case, replace

> Otherwise, if the next input character is a name-start character,
> switch to the ident state. Reconsume the current input character.
>
> Otherwise, emit a delim token with its value set to U+002D
> HYPHEN-MINUS (-). Remain in this state.

with

> Otherwise, if the next 2 input characters are U+005C REVERSE SOLIDUS
> (\) followed by a newline, or U+005C REVERSE SOLIDUS (\) followed by
> EOF, emit a delim token with its value set to U+002D HYPHEN-MINUS
> (-). Remain in this state.
>
> Otherwise, if the next input character is U+005C REVERSE SOLIDUS (\)
> or a name-start character, switch to the ident state. Reconsume the
> current input character.


In 3.3.12. Ident state, either:

* Make the same change (handle \ after -)
* Remove the checks for name-start after - and newline/EOF after \ since 
these checks are already made in the data state. The - and name-start 
cases can be merged:

> 3.3.12. Ident state
>
> Consume the next input character.
>
> U+002D HYPHEN-MINUS (-)
> name-start character
>   Create an ident token with its value set to the current input
>   character. Switch to the ident-rest state.
>
> U+005C REVERSE SOLIDUS (\)
>   Consume an escaped character. Create an ident token with its value
>   set to the returned character. Switch to the ident-rest state.


The same issue exists with at-keyword tokens, and similar changes should 
work.

-- 
Simon Sapin

Received on Sunday, 20 January 2013 10:14:00 UTC