Re: [css-syntax] Ready for wide review, FPWD request coming soon

Le 31/05/2013 08:14, Tab Atkins Jr. a écrit :
> Okay, I've just pushed a large commit that rewrites the tokenizer to
> be recursive-descent.

Here are a few comments:

§4.3.1. Consume a token

     U+0023 NUMBER SIGN (#)
     […] If the first three characters of the 〈hash〉’s value
     would start an identifier, set the 〈hash〉’s type flag to "id".

This sets the type flag based on the unescaped value of the token. It 
should be based on input characters instead.


     U+0055 LATIN CAPITAL LETTER U (U)
     U+0075 LATIN SMALL LETTER U (u)
     […] Otherwise, if the next 3 input characters are an ASCII
     case-insensitive match for "url(", consume them,
     consume a url token, and return it.

Opposite problem here. U, R or L can be escaped, so you need to check 
the result of "Consume a name" rather than input characters. (We 
resolved on that a few months ago.)

Also, this tokenizer does not generate 〈function〉s at all. I think it 
should have a "Consume an identifier-like token" algorithm (with a 
better name?) that returns one of 〈ident〉, 〈funtion〉 or 〈url〉; and use 
that whenever the current "Consume a token" returns an 〈ident〉.


§4.3.3. Consume a string token

     This algorithm must be called with an ending character,
     which denotes the character that ends the string.

This is not necessary. Every time "Consume a string token" is called, 
the ending character is also the current input character and thus does 
not need to be passed explicitly. By the way, this character needs to be 
consumed at the beginning of "Consume a string token", which otherwise 
terminates immediately.

Alternatively, consume the opening quote on ever call site. (I prefer 
the other solution.)


§4.3.13. Consume the remnants of a bad url

     the input stream starts with a valid escape
     Consume an escaped character.

Perhaps add a note that this is useful for the \) sequence. (This is the 
only sequence that makes this different from "Anything else: Do nothing.")


Editorial: I’d move related sections together. For example "Consume a 
number" next to "Consume a numeric token", "Consume the remnants of a 
bad url" next to "Consume a url token", etc.

Editorial, very minor: You use sometimes "would start {a number,an 
identifier}" and sometimes "starts with {a number,an identifier}" for 
the same thing. This confused me for a bit when using CTRL+F. Maybe pick 
one and use it everywhere? Consistency is good.


Cheers,
-- 
Simon Sapin

Received on Friday, 31 May 2013 09:59:09 UTC