Re: Issues with CSS21 grammar (CR 20070719) from Bert Bos on 2009-02-24 (www-style@w3.org from February 2009)

From: Bert Bos <bert@w3.org>
Date: Tue, 24 Feb 2009 21:52:23 +0100
To: Yves Lafon <ylafon@w3.org>, www-style@w3.org
Message-Id: <200902242152.24158.bert@w3.org>
On Friday 20 February 2009, Yves Lafon wrote:
> Dear CSSers,
>
> Here are some issues with the current grammar, as defined by
> http://www.w3.org/TR/CSS21/grammar.html dated 20070719,
> amended by the errata at
> http://www.w3.org/Style/css2-updates/CR-CSS21-20070719-errata.html)
>
> I/ Collision in "import" definition.
>
> stylesheet
>
>    : [ CHARSET_SYM STRING ';' ]?
>
>      [S|CDO|CDC]* [ import [S|CDO|CDC]* ]*
>      [ [ ruleset | media | page ] [S|CDO|CDC]* ]*
>
> import
>
>    : IMPORT_SYM S*
>
>      [STRING|URI] S* [ medium [ COMMA S* medium]* ]? ';' S*
>
> The final S* of the import rule collides with the S in [ import
> [S|CDO|CDC]* ]* The issue might be solved in two ways:
>
> 1/ In the 'stylesheet' rule:
>    [ import [S|CDO|CDC]* ]*
>    =>
>    [ import [[CDO|CDC] [S|CDO|CDC]*]? ]*
>
> 2/ In the 'import' rule:
>    [STRING|URI] S* [ medium [ COMMA S* medium]* ]? ';' S*
>    =>
>    [STRING|URI] S* [ medium [ COMMA S* medium]* ]? ';'

Yes, I know the grammar is ambiguous with respect to the S. But it 
doesn't actually matter whether you parse the S as belonging 
to "import" or to "stylesheet." The meaning is the same. So I didn't 
bother trying to clean it up.

If you think it is important enough, I prefer your solution 1. I applied 
a sort of metarule that helps with maintenance of the grammar: 
everywhere a token can be followed by a (semantically meaningless) S*, 
the S* follows the token right in that same grammar rule. That way I 
cannot forget it. Except that I did forget it for pseudo_page 
anyway :-(

>
> II / Collision in stylesheet and ruleset/media/page
> Same issue, the final S* of 'ruleset' 'media' and 'page' conflicts
> with [S|CDO|CDC]*
> To solve this one, the same two ways are possible:
>
> 1/ In the 'stylesheet' rule
>    [ [ ruleset | media | page ] [S|CDO|CDC]* ]*
>    =>
>    [ [ ruleset | media | page ] [[CDO|CDC] [S|CDO|CDC]]?* ]*
>
> 2/ In the 'media' rule
>    MEDIA_SYM S* medium [ COMMA S* medium ]* LBRACE S* ruleset* '}' S*
>    =>
>    MEDIA_SYM S* medium [ COMMA S* medium ]* LBRACE S* ruleset* S* '}'
>
>    In the 'page' rule
>    LBRACE S* declaration [ ';' S* declaration ]* '}' S*
>    =>
>    LBRACE S* declaration [ ';' S* declaration ]* '}'
>
>    In the 'ruleset' rule
>    LBRACE S* declaration [ ';' S* declaration ]* '}' S*
>    =>
>    LBRACE S* declaration [ ';' S* declaration ]* '}'

Yes, same thing.

>
> III / Error in the 'page' rule
>
>    PAGE_SYM S* pseudo_page? S*
> is problematic when pseudo_page is not present.
> The following soles that issue.
>    PAGE_SYM S* (pseudo_page S*)?

Same thing again. Though I'd prefer to fix it like this, if needed:

    page: PAGE_SYM S* pseudo_page? LBRACE...;
    pseudo_page: ':' IDENT S*;

>
> IV / Error in the 'pseudo' rule
>
> Same as in III/
>
> ':' [ IDENT | FUNCTION S* IDENT? S* ')' ]
> should read
> ':' [ IDENT | FUNCTION S* (IDENT S*)? ')' ]
>
> V / empty tokens
> To avoid empty tokens in the grammar, here are the proposed changes:
> 1/ in "operator"
> '/' S* | COMMA S* | /* empty */
> =>
> '/' S* | COMMA S*
> or even [ '/' | COMMA ] S*
> (Note, COMMA includes {w}, which is not the case for '/'. Same
> comment in the definition of 'unary_operator')
>
> in "expr"
> term [ operator term ]*
> =>
> term [ operator? term ]*

I have no problem with the change, but why are nullable non-terminals a 
problem? The algorithms to deal with them are well-known and included 
in every parser toolkit, aren't they?

>
> 2/ in "declaration"
>   property ':' S* expr prio?
>
>    | /* empty */
>
> =>
>   property ':' S* expr prio?
>
> in "page"
> LBRACE S* declaration [ ';' S* declaration ]* '}' S*
> =>
> LBRACE S* declaration? [ ';' S* declaration? ]* '}' S*
>
> in "ruleset"
> LBRACE S* declaration [ ';' S* declaration ]* '}' S*
> =>
> LBRACE S* declaration? [ ';' S* declaration? ]* '}' S*
>
> (note that in 'page' and 'ruleset' the final S* might be dropped,
> depending on the resolution of the issue II above).



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
Received on Tuesday, 24 February 2009 20:53:01 UTC