Re: Issues with CSS21 grammar (CR 20070719)

On Tue, 24 Feb 2009, Bert Bos wrote:

> On Friday 20 February 2009, Yves Lafon wrote:
>> Dear CSSers,
>>
>> Here are some issues with the current grammar, as defined by
>> http://www.w3.org/TR/CSS21/grammar.html dated 20070719,
>> amended by the errata at
>> http://www.w3.org/Style/css2-updates/CR-CSS21-20070719-errata.html)
>>
>> I/ Collision in "import" definition.
>>
>> stylesheet
>>
>>    : [ CHARSET_SYM STRING ';' ]?
>>
>>      [S|CDO|CDC]* [ import [S|CDO|CDC]* ]*
>>      [ [ ruleset | media | page ] [S|CDO|CDC]* ]*
>>
>> import
>>
>>    : IMPORT_SYM S*
>>
>>      [STRING|URI] S* [ medium [ COMMA S* medium]* ]? ';' S*
>>
>> The final S* of the import rule collides with the S in [ import
>> [S|CDO|CDC]* ]* The issue might be solved in two ways:
>>
>> 1/ In the 'stylesheet' rule:
>>    [ import [S|CDO|CDC]* ]*
>>    =>
>>    [ import [[CDO|CDC] [S|CDO|CDC]*]? ]*
>>
>> 2/ In the 'import' rule:
>>    [STRING|URI] S* [ medium [ COMMA S* medium]* ]? ';' S*
>>    =>
>>    [STRING|URI] S* [ medium [ COMMA S* medium]* ]? ';'
>
> Yes, I know the grammar is ambiguous with respect to the S. But it
> doesn't actually matter whether you parse the S as belonging
> to "import" or to "stylesheet." The meaning is the same. So I didn't
> bother trying to clean it up.

Well, it matters, especially as the spec claims that the grammar is 
LALR(1), which is not the case. CSS3 is honest, claiming that it can be 
locally LALR(2), which is in fact not true with the definition (I need to 
verify that, but it was easily rewritten to be LALR(2), the issue 
occuring in the namespace part).

> If you think it is important enough, I prefer your solution 1. I applied
> a sort of metarule that helps with maintenance of the grammar:
> everywhere a token can be followed by a (semantically meaningless) S*,
> the S* follows the token right in that same grammar rule. That way I
> cannot forget it. Except that I did forget it for pseudo_page
> anyway :-(

Ok, I prefer a bit solution 2, but I sympathize with why you prefer 1, so 
let's go with 1 :)

>>
>> II / Collision in stylesheet and ruleset/media/page
>> Same issue, the final S* of 'ruleset' 'media' and 'page' conflicts
>> with [S|CDO|CDC]*
>> To solve this one, the same two ways are possible:
>>
>> 1/ In the 'stylesheet' rule
>>    [ [ ruleset | media | page ] [S|CDO|CDC]* ]*
>>    =>
>>    [ [ ruleset | media | page ] [[CDO|CDC] [S|CDO|CDC]]?* ]*
>>
>> 2/ In the 'media' rule
>>    MEDIA_SYM S* medium [ COMMA S* medium ]* LBRACE S* ruleset* '}' S*
>>    =>
>>    MEDIA_SYM S* medium [ COMMA S* medium ]* LBRACE S* ruleset* S* '}'
>>
>>    In the 'page' rule
>>    LBRACE S* declaration [ ';' S* declaration ]* '}' S*
>>    =>
>>    LBRACE S* declaration [ ';' S* declaration ]* '}'
>>
>>    In the 'ruleset' rule
>>    LBRACE S* declaration [ ';' S* declaration ]* '}' S*
>>    =>
>>    LBRACE S* declaration [ ';' S* declaration ]* '}'
>
> Yes, same thing.
>
>>
>> III / Error in the 'page' rule
>>
>>    PAGE_SYM S* pseudo_page? S*
>> is problematic when pseudo_page is not present.
>> The following soles that issue.
>>    PAGE_SYM S* (pseudo_page S*)?
>
> Same thing again. Though I'd prefer to fix it like this, if needed:
>
>    page: PAGE_SYM S* pseudo_page? LBRACE...;
>    pseudo_page: ':' IDENT S*;
>
>>
>> IV / Error in the 'pseudo' rule
>>
>> Same as in III/
>>
>> ':' [ IDENT | FUNCTION S* IDENT? S* ')' ]
>> should read
>> ':' [ IDENT | FUNCTION S* (IDENT S*)? ')' ]
>>
>> V / empty tokens
>> To avoid empty tokens in the grammar, here are the proposed changes:
>> 1/ in "operator"
>> '/' S* | COMMA S* | /* empty */
>> =>
>> '/' S* | COMMA S*
>> or even [ '/' | COMMA ] S*
>> (Note, COMMA includes {w}, which is not the case for '/'. Same
>> comment in the definition of 'unary_operator')
>>
>> in "expr"
>> term [ operator term ]*
>> =>
>> term [ operator? term ]*
>
> I have no problem with the change, but why are nullable non-terminals a
> problem? The algorithms to deal with them are well-known and included
> in every parser toolkit, aren't they?

Yes, but as there is an unambiguous way to avoid them, it avoids using 
'operator*' in another document, for example (which would lead to an 
infinite loop, detected by all parser toolkit, but still, bad design of 
the grammar).

>>
>> 2/ in "declaration"
>>   property ':' S* expr prio?
>>
>>    | /* empty */
>>
>> =>
>>   property ':' S* expr prio?
>>
>> in "page"
>> LBRACE S* declaration [ ';' S* declaration ]* '}' S*
>> =>
>> LBRACE S* declaration? [ ';' S* declaration? ]* '}' S*
>>
>> in "ruleset"
>> LBRACE S* declaration [ ';' S* declaration ]* '}' S*
>> =>
>> LBRACE S* declaration? [ ';' S* declaration? ]* '}' S*
>>
>> (note that in 'page' and 'ruleset' the final S* might be dropped,
>> depending on the resolution of the issue II above).
>
>
>
> Bert
>

-- 
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves

Received on Wednesday, 25 February 2009 15:11:57 UTC