Re: [css3-values] inaccurate statements about syntax/grammar from Simon Sapin on 2012-04-05 (www-style@w3.org from April 2012)

From: Simon Sapin <simon.sapin@kozea.fr>
Date: Thu, 05 Apr 2012 14:31:05 +0200
To: "Tab Atkins Jr." <jackalmage@gmail.com>
CC: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>, WWW Style <www-style@w3.org>
Message-ID: <4F7D9089.2040605@kozea.fr>

Le 04/04/2012 19:21, Tab Atkins Jr. a écrit :
> On Wed, Apr 4, 2012 at 3:15 AM, Kang-Hao (Kenny) Lu
> <kennyluck@csail.mit.edu>  wrote:
>> 2. 'content: attr(inherit);' is valid (testing on only WebKit and Firefox)
>> (Feel free to add more if you know anymore. Even if these exceptions are
>> too messy to put in the spec, it's probably not a bad idea if a complete
>> list is archived on www-style.)
>
> The 'inherit' keyword there isn't a component value - the attr()
> function is.  There's no restriction against 'inherit' as a function
> argument.

I agree with Kenny that "component value" is ill-defined.

According to section 2.1 of V&U, <'background-position'> is a single 
component but "top left" (which is a valid <'background-position'>) is 
made of two components. Which is it?

For implementing a parser, I found useful to have an intermediate step 
between tokenization and parsing: turn a flat sequence of tokens into a 
"regrouped" tree of {} [] () pairs and atomic tokens. The algorithm is 
something like this:

For {, [, ( and FUNCTION tokens, find the matching }, ] or ) token. The 
whole sub-sequence is replaced by a single "grouped token" which 
contains a list of tokens of everything between the start and end 
tokens. That list is recursively "regrouped".

Having this tree structure makes much easier things like error recovery 
(ignore a whole at-rule and its {} body) or parsing of functional values.

All this is only an implementation detail, but could a similar concept 
be useful to define "component value" in the spec? rgb(0, 127, 255) 
would be a single component that contains 5 sub-components (assuming you 
count commas but not white space). Nested functional notation (like 
rgb() in a gradient) would form a tree of components.

> From what I can tell, the only way to parse "-2" is as an IDENT, not a
> NUMBER, and the only way to parse "+2" is as a DELIM and a NUMBER.
>
> I'm either quite mistaken (in which case I'd appreciate a
> clarification!) or this is a 2.1 error that's needed fixing forever,
> and all implementations have just hacked around it.  If the latter,
> then once we fix it (presumably to put an optional +/- in the
> definition of the NUMBER token), your #1 will be addressed because
> it'll be a single token, as will your following comments about signs
> in<percentage>  (which I've elided for brevity).

As Kenny said, "-2" does not match [-]?{nmstart}{nmchar}* and is not an 
IDENT. It is DELIM and a NUMBER just like "+2".

Apparently, UAs are expected to somehow find consecutive DELIM and 
{NUMBER,DIMENSION,PERCENTAGE} tokens to handle signed numbers. The 
situation is not as bad as if "-2" was an IDENT, but I still think that 
the core grammar should be changed to include the optional sign in the 
{num} macro. This would make many details much easier to deal with, and 
the only effective change would be that "+/**/100px" would not be valid 
anymore. But does any web page rely on that?

-- 
Simon Sapin

Received on Thursday, 5 April 2012 12:31:46 UTC