CSS 2.1 grammar

Forwarding a few questions on behalf of an unsubscribed friend,
 /Staffan


It seems there are some problems with the grammar in Appendix G
in CSS 2.1.

  1. According to chapter 4 in CSS 2.1, identifiers may
     now begin with '-'. However, the lexical scanner in
     Appendix G.2 has not been updated to reflect this change.

  2. In section 4.1.3 it says about identifiers:
       "... they cannot start with a digit."
     Shouldn't this be:
       "... they cannot start with a digit, or a '-' followed by a digit."

  3. The grammar in Appendix G.1 states that function values
     may be prefixed with a unary operator, that is, '+' or '-'.
     But with the introduction of identifiers that start with '-',
     it will no longer be possible to prefix functions with '-'.
     Any attempt to do so will only cause the '-' to be considered
     part of the function name. In other words, this syntax:
       -myfunc(x)
     will be tokenized as:
       FUNCTION, IDENT, ')'
     rather than
       '-', FUNCTION, IDENT, ')'

     As far as I know there are no functions in CSS 2.1 that it would
     make sense to prefix with a minus sign, but apparently such functions
     (representing length values, for example) may be introduced in CSS3?
     (Note, that it will not help to say that function names cannot start
      with '-', because the tokenizer will still see the '-' as part of
      the identifier, and the example above will yield:
        IDENT, '(', IDENT, ')'
      which will likely cause a parsing error.)

  4. Another change in the lexical scanner in CSS 2.1 is that some tokens
     are defined to include their preceding white space (for example LBRACE).
     This, together with the change that the production 'simple_selector'
     no longer ends with S*, means that, for example, the following syntax
     is no longer valid:
        P  /* A comment */  { color: red }
     because this will tokenize as:
        IDENT, S, LBRACE, S, IDENT, ':', S, IDENT, S, '}'
     and the grammar does not allow the first S. (According to section
     4.1.9, comments only occur between tokens, so the space before the
     comment will not be seen as part of the LBRACE token.)

  5. The token UNICODERANGE has been removed from the lexical scanner,
     and also from the grammar. However, the 'range' definition is still
     present in the lexical scanner, but is no longer used anywhere. Also,
     the UNICODE-RANGE token is defined in section 4.1.1. Should it be in
     the grammar or not?

  6. In the lexical scanner in Appendix G.2, nmstart only allows lower case
     letters.

Received on Friday, 4 June 2004 12:07:38 UTC