Re: [CSS21] Grammar Errors

[Recorded as issue 206 http://wiki.csswg.org/spec/css2.1#issue-206]

On Saturday 28 August 2010 18:09:27 Mark wrote:
> Hello,
> I'm implementing a CSS parser, and I've noticed some errors in the
> grammar that don't appear to be documented in the CSS 2.1
> errata. They are to do with the hexcolor definitions.

Note that it is probably a better idea to ignore appendix G and instead 
implement the generic grammar from chapter 4. There will be no hexcolor 
to worry about then. And that way your parser will also parse level 3 
features.

Depending on what you want to do with the parser, you will probably need 
individual routines to check each known property anyway, because both 
'color: #777' and 'font: #777' are syntactically correct CSS, but the 
latter is not valid in level 2.

> 
> In section 4.3.6 Colors
> The format of an RGB value in hexadecimal notation is a '#'
> immediately followed by either three or six hexadecimal characters.
> 
> 
> In Appendix G. Grammar of CSS 2.1
> 
> /*
>  * There is a constraint on the color that it must
>  * have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
>  * after the "#"; e.g., "#000" is OK, but "#abcd" is not.
>  */
> hexcolor
> 
>  : HASH S*
> 
>  ;
> 
> "#"{name}               {return HASH;}
> name            {nmchar}+
> nmchar          [_a-z0-9-]|{nonascii}|{escape}
> 
> 
> Now there are quite a few errors in the Appendix.
> 
> 1. The grammar is case insensitive, so the comment shows a redundant
> A-F.

Yes, but it's written for humans. It's redundant, but not wrong, and 
it's probably safer to be a bit redundant in this case. Another option 
could have been to put the example in uppercase.

> 2. nmchar is not defined as [0-9a-f], it's much less
> restrictive allowing a whole host of non-hex characters to be
> present.
> 3. the definition for {name} appears to allow 1 or more hex digits
> (not the 3 or 6 specified elsewhere)
> 4. similar to comment 3, the other groups {nonascii}, {escape} will
> also have invalid lengths
> 
> So in summary, the grammar for hexcolor needs to be completely
> separated from the grammar for HASH as there is no sensible reuse
> possible here.

It's a limitation of the chosen notation. Without adding context-
dependency to the tokenizer, we cannot have at the same time a token for 
colors and a token for ID selectors. In the context of selectors, #fff 
is an ID selector, but in the context of certain declarations, it is a 
color.

Also, CSS reserves the possibility that some property in the future 
accepts #foo12 as a value (to refer to an ID in the document, e.g.). I 
can even imagine some weird property that accepts both color and hash: 
'id-to-color-map: #foo12 blue, #foo13 #fff, #foo14 red'. Such a thing 
doesn't exist in level 2, of course, but in the future the context for 
tokenizing colors might be hard to define...

> 
> 
> I can also point out that the grammar could be made more consistent
> if the trailing whitespace on 'hexcolor' and 'function' was moved
> into the term block as follows:-
> 
> term
> 
>  : unary_operator?
> 
>    [ NUMBER S* | PERCENTAGE S* | LENGTH S* | EMS S* | EXS S* | ANGLE
> S* | TIME S* | FREQ S* ]
> 
>  | STRING S* | IDENT S* | URI S* | hexcolor S* | function S*
> 
> function
> 
>  : FUNCTION S* expr ')'
> 
> hexcolor
> 
>  : HASH

Consistency is in the eye of the beholder. :-) The rule that the grammar 
follows is that S tokens are, whenever possible added after terminals 
rather than after non-terminals.

But, as I said, please consider implementing the generic grammar from 
chapter 4 instead, unless you have a very good reason to ignore level 3 
style sheets.



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Wednesday, 2 March 2011 19:09:49 UTC