Re: [css3-values] inaccurate statements about syntax/grammar from Tab Atkins Jr. on 2012-04-05 (www-style@w3.org from April 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Thu, 5 Apr 2012 11:28:33 -0700
To: Simon Sapin <simon.sapin@kozea.fr>
Cc: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
Message-ID: <CAAWBYDApPr5XLsEr_C6=tvZm7oqkVJ3WnF_1v_-3oUi__1_5Pg@mail.gmail.com>

On Thu, Apr 5, 2012 at 5:31 AM, Simon Sapin <simon.sapin@kozea.fr> wrote:
> I agree with Kenny that "component value" is ill-defined.
>
> According to section 2.1 of V&U, <'background-position'> is a single
> component but "top left" (which is a valid <'background-position'>) is made
> of two components. Which is it?

Hm, yeah, that is kind of inconsistent.  fantasai, we'll have to work
on this definition a bit more.  :/


> For implementing a parser, I found useful to have an intermediate step
> between tokenization and parsing: turn a flat sequence of tokens into a
> "regrouped" tree of {} [] () pairs and atomic tokens. The algorithm is
> something like this:
>
> For {, [, ( and FUNCTION tokens, find the matching }, ] or ) token. The
> whole sub-sequence is replaced by a single "grouped token" which contains a
> list of tokens of everything between the start and end tokens. That list is
> recursively "regrouped".
>
> Having this tree structure makes much easier things like error recovery
> (ignore a whole at-rule and its {} body) or parsing of functional values.
>
> All this is only an implementation detail, but could a similar concept be
> useful to define "component value" in the spec? rgb(0, 127, 255) would be a
> single component that contains 5 sub-components (assuming you count commas
> but not white space). Nested functional notation (like rgb() in a gradient)
> would form a tree of components.

Yes, that's a more useful definition of "token" for spec purposes,
once you rise above raw grammar concerns.  (Honestly, I kinda want to
just write an explicit parser a la HTML and have it emit tokens like
that.)


>> From what I can tell, the only way to parse "-2" is as an IDENT, not a
>> NUMBER, and the only way to parse "+2" is as a DELIM and a NUMBER.
>>
>> I'm either quite mistaken (in which case I'd appreciate a
>> clarification!) or this is a 2.1 error that's needed fixing forever,
>> and all implementations have just hacked around it.  If the latter,
>> then once we fix it (presumably to put an optional +/- in the
>> definition of the NUMBER token), your #1 will be addressed because
>> it'll be a single token, as will your following comments about signs
>> in<percentage>  (which I've elided for brevity).
>
> As Kenny said, "-2" does not match [-]?{nmstart}{nmchar}* and is not an
> IDENT. It is DELIM and a NUMBER just like "+2".

Yeah, y'all are right.


> Apparently, UAs are expected to somehow find consecutive DELIM and
> {NUMBER,DIMENSION,PERCENTAGE} tokens to handle signed numbers. The situation
> is not as bad as if "-2" was an IDENT, but I still think that the core
> grammar should be changed to include the optional sign in the {num} macro.
> This would make many details much easier to deal with, and the only
> effective change would be that "+/**/100px" would not be valid anymore. But
> does any web page rely on that?

If they do, they deserve any pain we cause them.  I agree that the
sign should be part of the NUM token.

~TJ

Received on Thursday, 5 April 2012 18:29:21 UTC