Re: [css3-syntax] Added an "an+b parsing" section, please review from Tab Atkins Jr. on 2013-02-01 (www-style@w3.org from February 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Fri, 1 Feb 2013 12:43:31 -0800
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDBbeVYeWkGM8ec8c5W+Ly=LJh8dEuxD+HLG=NRyYG6s=Q@mail.gmail.com>

On Thu, Jan 31, 2013 at 11:44 PM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> Le 01/02/2013 03:30, Tab Atkins Jr. a écrit :
>> I added a small algorithm for parsing an+b values at
>> <http://dev.w3.org/csswg/css3-syntax/#parse-anb-notation>.  It's just
>> a "turn everything back into a string and reparse" algorithm.
>>
>> Does it look good?  Alternately, I could do it by looking at the
>> original tokens, it's just a bit messier that way.  There's 6
>> different ways a valid an+b can be tokenized, and one of them involves
>> a dimension token with a unit matching /n[+-]\d+/.
>
>
> It’s very good to have this in Syntax. Thanks!
>
> "Turn tokens back to a string and reparse" is not pretty, but as you say
> parsing from tokens is worse. And more likely to have some corner cases
> wrong.

Luckily the official grammar in Selectors, while wrong, is at least
very simple and limited.  I think I've got all the token-based parsing
cases mapped out.  But still, reparsing from a string is easier.

> Is the an+b notation used in anything other than :nth-child() and related
> Selectors?

So far, no.  But we might use it elsewhere in the future.

> The various algorithms in §6. Parser Entry Points are assumed to parse from
> text. But for Selectors at least, an+b is in the arguments of an
> already-tokenized function so the input is a list of component values, not
> text. Although that does not make much difference as all the relevant tokens
> are preserved.

I tried to wordsmith so that they all really just assume a list of
tokens.  If you have suggestions about how to phrase the intro better,
I'd appreciate it.

You're right that the an+b is likely already going to be done with a
list of component values, but as you point out, it doesn't matter for
the algorithm's purposes.  I think I might keep it somewhat vague
here, so that it's valid to invoke it either normally or with a normal
token stream.

> On to the algorithm itself:
>
> Dimension tokens should also append their unit to the string. The
> "representation" is only that of the numeric part.

Ooh, thanks.  Fixed.  Also, idents have a value, not a representation.
 I should maybe fix these, as I've made this mistake before. :/

> All whitespace tokens are ignored. This is not the case in the "nth" grammar
> from Selectors 3. In particular, no whitespace is allowed between a and its
> sign, nor between a and n. (Whitespace *is* allowed around the +/- sign
> after n, and around the whole an+b sequence.)
>
> nth
>   : S* [ ['-'|'+']? INTEGER? {N} [ S* ['-'|'+'] S* INTEGER ]? |
>          ['-'|'+']? INTEGER | {O}{D}{D} | {E}{V}{E}{N} ] S*
>   ;

Damn, you're right.  Hmm.  I was *really* trying to avoid having to
deal with spaces between the signs and the numbers.  I suppose I can
do whitespace-stripping late, so I can check the step and fail it if
there is any trailing whitespace.  I'll look into how I want to do
this.

> For odd/even, the phrase used is "If repr contains an ASCII case-insensitive
> match for …" It should be "is" instead of "contains". :nth-child(Some
> oddities) should not match. ("Contains" is used that way later in the
> algorithm.)

Fixed, thanks.

~TJ

Received on Friday, 1 February 2013 20:44:17 UTC