Re: [css-syntax] Ready for wide review, FPWD request coming soon from Tab Atkins Jr. on 2013-05-31 (www-style@w3.org from May 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Fri, 31 May 2013 10:47:49 -0700
To: Simon Sapin <simon.sapin@exyr.org>
Cc: Zack Weinberg <zackw@panix.com>, www-style list <www-style@w3.org>
Message-ID: <CAAWBYDDM1RU4+X_qLNiD=Q0qROGCPR99V3MZwEwkqD2Vbt1dQQ@mail.gmail.com>

On Fri, May 31, 2013 at 2:58 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
> Le 31/05/2013 08:14, Tab Atkins Jr. a écrit :
>> Okay, I've just pushed a large commit that rewrites the tokenizer to
>> be recursive-descent.
>
>
> Here are a few comments:
>
> §4.3.1. Consume a token
>
>     U+0023 NUMBER SIGN (#)
>     […] If the first three characters of the 〈hash〉’s value
>     would start an identifier, set the 〈hash〉’s type flag to "id".
>
> This sets the type flag based on the unescaped value of the token. It should
> be based on input characters instead.

Ah, sure.  That was slightly more awkward to spec, so I did it the
current way, but you're right.  Fixed.

>     U+0055 LATIN CAPITAL LETTER U (U)
>     U+0075 LATIN SMALL LETTER U (u)
>     […] Otherwise, if the next 3 input characters are an ASCII
>     case-insensitive match for "url(", consume them,
>     consume a url token, and return it.
>
> Opposite problem here. U, R or L can be escaped, so you need to check the
> result of "Consume a name" rather than input characters. (We resolved on
> that a few months ago.)

Ah, thanks.  Fixed.

> Also, this tokenizer does not generate 〈function〉s at all. I think it should
> have a "Consume an identifier-like token" algorithm (with a better name?)
> that returns one of 〈ident〉, 〈funtion〉 or 〈url〉; and use that whenever the
> current "Consume a token" returns an 〈ident〉.

Urf, I knew there had to be something wrong with my ident handling.
Yeah, fixed.

> §4.3.3. Consume a string token
>
>     This algorithm must be called with an ending character,
>     which denotes the character that ends the string.
>
> This is not necessary. Every time "Consume a string token" is called, the
> ending character is also the current input character and thus does not need
> to be passed explicitly. By the way, this character needs to be consumed at
> the beginning of "Consume a string token", which otherwise terminates
> immediately.
>
> Alternatively, consume the opening quote on ever call site. (I prefer the
> other solution.)

Every call site already consumes the starting character, so this is
fine as written.

> §4.3.13. Consume the remnants of a bad url
>
>     the input stream starts with a valid escape
>     Consume an escaped character.
>
> Perhaps add a note that this is useful for the \) sequence. (This is the
> only sequence that makes this different from "Anything else: Do nothing.")

Sure, done.  (Good catch, by the way - when I was first rewriting this
section, I considered dropping it until I realized it was necessary
for that exact case.)

> Editorial: I’d move related sections together. For example "Consume a
> number" next to "Consume a numeric token", "Consume the remnants of a bad
> url" next to "Consume a url token", etc.

My strategy right now is to put the "consume an <x> token" sections
together, and then put the utility functions together.  I don't like
interspersing them very much, because some are used by multiple
states, so it's unclear what they should be next to.

> Editorial, very minor: You use sometimes "would start {a number,an
> identifier}" and sometimes "starts with {a number,an identifier}" for the
> same thing. This confused me for a bit when using CTRL+F. Maybe pick one and
> use it everywhere? Consistency is good.

I know I created the two variant linktexts for a reason, but I'll see
if I can reword to a consistent pattern without the prose being
awkward.

~TJ

Received on Friday, 31 May 2013 17:48:35 UTC