Re: Issues with CSS21 grammar (CR 20070719)

I've tried to improve the grammar of appendix G and I think I now have a 
version that defines the same language as before, but is LL(1). It has 
no ambiguities and no nullable non-terminals (except for the start 
symbol: "stylesheet" can of course still be empty).

Compared to the last edits in response to Yves's suggestions, I've only 
further changed "selector" and "combinator".

I'd like the grammar of appendix G to be as useful as possible, even 
though I know not many programs can use it. (Maybe it serves a 
validator, but all other programs will have to accept the forward 
compatible grammar instead.)

I'd like to ask especially Andrey and Yves to take a look...

I tested the grammar with an LL(1) parser generator and, after carefully 
expanding the rules, with Yacc. And it seems to work: neither complains 
about ambiguities and the resulting programs accept my various tests.



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
stylesheet
  : [ CHARSET_SYM STRING ';' ]?
    [S|CDO|CDC]* [ import [ CDO S* | CDC S* ]* ]*
    [ [ ruleset | media | page ] [ CDO S* | CDC S* ]* ]*
  ;
import
  : IMPORT_SYM S*
    [STRING|URI] S* [ medium [ ',' S* medium]* ]? ';' S*
  ;
media
  : MEDIA_SYM S* medium [ ',' S* medium ]* '{' S* ruleset* '}' S*
  ;
medium
  : IDENT S*
  ;
page
  : PAGE_SYM S* pseudo_page?
    '{' S* declaration? [ ';' S* declaration? ]* '}' S*
  ;
pseudo_page
  : ':' IDENT S*
  ;
operator
  : '/' S* | ',' S*
  ;
combinator
  : '+' S*
  | '>' S*
  ;
unary_operator
  : '-' | '+'
  ;
property
  : IDENT S*
  ;
ruleset
  : selector [ ',' S* selector ]*
    '{' S* declaration? [ ';' S* declaration? ]* '}' S*
  ;
selector
  : simple_selector [ combinator selector | S+ [ combinator? selector ]? ]?
  ;
simple_selector
  : element_name [ HASH | class | attrib | pseudo ]*
  | [ HASH | class | attrib | pseudo ]+
  ;
class
  : '.' IDENT
  ;
element_name
  : IDENT | '*'
  ;
attrib
  : '[' S* IDENT S* [ [ '=' | INCLUDES | DASHMATCH ] S*
    [ IDENT | STRING ] S* ]? ']'
  ;
pseudo
  : ':' [ IDENT | FUNCTION S* [IDENT S*]? ')' ]
  ;
declaration
  : property ':' S* expr prio?
  ;
prio
  : IMPORTANT_SYM S*
  ;
expr
  : term [ operator? term ]*
  ;
term
  : unary_operator?
    [ NUMBER S* | PERCENTAGE S* | LENGTH S* | EMS S* | EXS S* | ANGLE S* |
      TIME S* | FREQ S* ]
  | STRING S* | IDENT S* | URI S* | hexcolor | function
  ;
function
  : FUNCTION S* expr ')' S*
  ;
/*
 * There is a constraint on the color that it must
 * have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
 * after the "#"; e.g., "#000" is OK, but "#abcd" is not.
 */
hexcolor
  : HASH S*
  ;
%option case-insensitive

h               [0-9a-f]
nonascii        [\200-\377]
unicode         \\{h}{1,6}(\r\n|[ \t\r\n\f])?
escape          {unicode}|\\[^\r\n\f0-9a-f]
nmstart         [_a-z]|{nonascii}|{escape}
nmchar          [_a-z0-9-]|{nonascii}|{escape}
string1         \"([^\n\r\f\\"]|\\{nl}|{escape})*\"
string2         \'([^\n\r\f\\']|\\{nl}|{escape})*\'
invalid1        \"([^\n\r\f\\"]|\\{nl}|{escape})*
invalid2        \'([^\n\r\f\\']|\\{nl}|{escape})*

comment         \/\*[^*]*\*+([^/*][^*]*\*+)*\/
ident           -?{nmstart}{nmchar}*
name            {nmchar}+
num             [0-9]+|[0-9]*"."[0-9]+
string          {string1}|{string2}
invalid         {invalid1}|{invalid2}
url             ([!#$%&*-~]|{nonascii}|{escape})*
s               [ \t\r\n\f]+
w               {s}?
nl              \n|\r\n|\r|\f

A               a|\\0{0,4}(41|61)(\r\n|[ \t\r\n\f])?
C               c|\\0{0,4}(43|63)(\r\n|[ \t\r\n\f])?
D               d|\\0{0,4}(44|64)(\r\n|[ \t\r\n\f])?
E               e|\\0{0,4}(45|65)(\r\n|[ \t\r\n\f])?
G               g|\\0{0,4}(47|67)(\r\n|[ \t\r\n\f])?|\\g
H               h|\\0{0,4}(48|68)(\r\n|[ \t\r\n\f])?|\\h
I               i|\\0{0,4}(49|69)(\r\n|[ \t\r\n\f])?|\\i
K               k|\\0{0,4}(4b|6b)(\r\n|[ \t\r\n\f])?|\\k
L               l|\\0{0,4}(4c|6c)(\r\n|[ \t\r\n\f])?|\\l
M               m|\\0{0,4}(4d|6d)(\r\n|[ \t\r\n\f])?|\\m
N               n|\\0{0,4}(4e|6e)(\r\n|[ \t\r\n\f])?|\\n
O               o|\\0{0,4}(4f|6f)(\r\n|[ \t\r\n\f])?|\\o
P               p|\\0{0,4}(50|70)(\r\n|[ \t\r\n\f])?|\\p
R               r|\\0{0,4}(52|72)(\r\n|[ \t\r\n\f])?|\\r
S               s|\\0{0,4}(53|73)(\r\n|[ \t\r\n\f])?|\\s
T               t|\\0{0,4}(54|74)(\r\n|[ \t\r\n\f])?|\\t
U               u|\\0{0,4}(55|75)(\r\n|[ \t\r\n\f])?|\\u
X               x|\\0{0,4}(58|78)(\r\n|[ \t\r\n\f])?|\\x
Z               z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z

%%

{s}                     {return S;}

\/\*[^*]*\*+([^/*][^*]*\*+)*\/          /* ignore comments */

"<!--"          {return CDO;}
"-->"                   {return CDC;}
"~="                    {return INCLUDES;}
"|="                    {return DASHMATCH;}

{string}                {return STRING;}
{invalid}               {return INVALID; /* unclosed string */}

{ident}                 {return IDENT;}

"#"{name}               {return HASH;}

@{I}{M}{P}{O}{R}{T}     {return IMPORT_SYM;}
@{P}{A}{G}{E}           {return PAGE_SYM;}
@{M}{E}{D}{I}{A}        {return MEDIA_SYM;}
"@charset "             {return CHARSET_SYM;}

"!"({w}|{comment})*{I}{M}{P}{O}{R}{T}{A}{N}{T}  {return IMPORTANT_SYM;}

{num}{E}{M}             {return EMS;}
{num}{E}{X}             {return EXS;}
{num}{P}{X}             {return LENGTH;}
{num}{C}{M}             {return LENGTH;}
{num}{M}{M}             {return LENGTH;}
{num}{I}{N}             {return LENGTH;}
{num}{P}{T}             {return LENGTH;}
{num}{P}{C}             {return LENGTH;}
{num}{D}{E}{G}          {return ANGLE;}
{num}{R}{A}{D}          {return ANGLE;}
{num}{G}{R}{A}{D}       {return ANGLE;}
{num}{M}{S}             {return TIME;}
{num}{S}                {return TIME;}
{num}{H}{Z}             {return FREQ;}
{num}{K}{H}{Z}          {return FREQ;}
{num}{ident}            {return DIMENSION;}

{num}%                  {return PERCENTAGE;}
{num}                   {return NUMBER;}

{U}{R}{L}"("{w}{string}{w}")"   {return URI;}
{U}{R}{L}"("{w}{url}{w}")"      {return URI;}

{ident}"("              {return FUNCTION;}

.                       {return *yytext;}

Received on Monday, 10 August 2009 16:03:44 UTC