CSS3 selectors critique (WD-css3-selectors-20010126)

What follows is commentary on the CSS3 module Selectors 
<http://www.w3.org/TR/2001/WD-css3-selectors-20010126>.



1.1 Changes from CSS 2 <http://www.w3.org/TR/css3-selector
s/#changesFromCSS2>

"the list of basic definitions (selector, group of selectors, simple 
selector, ..) has been clarified"

The definitions have quite changed.  This is not a mere clarification.  
See following item.



4. Selector syntax <http://www.w3.org/TR/css3-selectors/#selector-syntax>

"A sequence of simple selectors is a chain of simple selectors that are 
not separated by a combinator."

This sequence was called "simple selector" in CSS2.  The shifting 
terminology is confusing and unnecessary.

"It [a sequence of simple selectors] always begin with a type selector or 
a universal selector."

Typographical error: "begin" should be "begins".

According to the formal grammar of CSS3 selectors, the type selector and 
the universal selector are optional.  The formal grammar agrees in this 
respect with CSS2, while the prose breaks with CSS2 and with the 
formal grammar.

"A simple selector is either a type selector, universal selector, 
attribute selector, ID selector, content selector, pseudo-class."

English error: insert the word "or" before the word "pseudo-class".

This is a redefinition of the CSS2 term "simple selector".  If the CSS3 
selectors module instead used one of the terms "selector particle", 
"selector atom", or "simple selector fragment", we could retain the CSS2 
definition of "simple selector".  This, in turn, eliminates the ungainly 
term "sequence of simple selectors".

"Combinators are: whitespace, '>', '+' and '~'."

The descendant combinator, noted here as whitespace, may be a series of 
one or more comments with no whitespace.

"The elements of the document tree represented by a selector are called 
subjects of the selector."

This can be taken to mean that for the selector "HTML BODY ADDRESS EM A", 
a document may have subjects of element type 'HTML', subjects of element 
type 'BODY', and so forth.  A better wording is, "The elements of the 
document tree which match a selector are called subjects of the selector."

"the subjects of a selector are always a subset of the elements 
represented by the rightmost sequence of simple selectors."

Change "rightmost" to "last".  This is a matter of 
internationalization (internationalisation), of accessibility, and also of 
plain good sense.  The CSS syntax is not tied to some visual presentation.

To be consistent with the previous item, change "represented" to "matched".



7. Pseudo-elements <http://www.w3.org/TR/css3-selectors/#pseudo-elements>

"For compatibility reasons with existing stylesheets, user agents must 
also accept the one-colon previous notation. This compatibility is not 
required for the new pseudo-elements introduced in CSS level 3."

Can this be expressed in the formal grammar?

Why was this laxness introduced?  Is there a problem with mandating 
acceptance of the single-colon notation?  Is there a problem with 
mandating rejection of the single-colon notation?



9. Calculating a selector's specificity 
<http://www.w3.org/TR/css3-selectors/#specificity>

The confusing explanation of specificity can be and should be avoided by 
refusing to treat specificity as a single number.  There is no universal 
numeric base for specificity, so a single-number specificity is 
meaningless outside of a given cascade.  Specificity in CSS is an ordered 
triplet, (a, b, c), and we should represent it as such.

The confusion is infectious; read Rich in Style's interpretation, 
<http://www.richinstyle.com/bugs/mozilla.html#errors> item 22.


 
10. The grammar of W3C selectors <http://www.w3.org/TR/css3-selector
s/#w3cselgrammar>
10.1 Grammar <http://www.w3.org/TR/css3-selectors/#grammar>

"selectors_group
  : selector [ ',' S* selector ]*
  ;"

This does not permit whitespace between a selector and the comma that 
follows it.  The whitespace might be specified in the 'selectors_group' 
production or in the 'selector' production.

"simple_selector_sequence
  /* the universal selector is optional */
  : [ type_selector | universal ]?
        [ HASH | class | attrib | pseudoclass | negation ]+ |
    type_selector | universal
  ;"

Is the universal selector optional in the grammar?  Is the universal 
selector optional in the sense that its semantic is implied when one of [ 
HASH | class | attrib | pseudoclass | negation ] is present?

There is no definition of the 'negation' production.  It appears that the 
'pseudoclass' production, through the 'functional_pseudo' production, is 
attempting to encompass the work what would have belonged to the 
'negation' production.

However, the lack of a 'negation' production prevents the 'negation_arg' 
production from enforcing the prohibition on nested negation 
pseudo-classes.  This prohibition is stated in a comment in the 
'pseudoclass' production and in Section 6.6.7 "The negation pseudo-class".

The prohibition on the use of a pseudo-element selector as an argument to 
the negation pseudo-class is redundant.  The negation pseudo-class, by 
its definition, takes a CSS3 simple selector as an argument.  A 
pseudo-element selector is not a CSS3 simple selector and so must not be 
used.



10.2 Lexical scanner <http://www.w3.org/TR/css3-selectors/#lex>

"{integer]               {return INTEGER;}"

The closing square bracket should be a closing curly brace.

'"^="                    (return PREFIXMATCH;)
"$="                    (return SUFFIXMATCH;)
"*="                    (return SUBSTRINGMATCH;)'

The parentheses should be curly braces.

The 'expression' production is misnamed, introduces unnecessary token 
types, breaks compatibility with CSS2, and possibly is wrong.

A better name would avoid confusion with the CSS2 production 'expr', used 
for property values.  I offer the name 'positions'.

The sequence of an 'INTEGER' token without a minus sign followed by the 
'IDENT' token "n" would occur only if a comment intervened.  Otherwise 
(unless the general lexing has revolutionized since CSS2) the result will 
be a single 'DIMENSION' token.

I suggest the following production.

expression /* or 'positions' as the case may be */
  :  unary_operator? INTEGER [ '*' 'n' [ unary_operator INTEGER ]? ]? | 'n'
  ;

I am wary of new token types and of grammars with alternatives or with 
optional components.  Alternatives in CSS seem to be little more than 
opportunities for misimplementation.  I favor cutting the 'odd' and 
'even' arguments for this reason.  Perhaps we should further restrict 
the production:

expression /* or 'positions' as the case may be */
  :  unary_operator? INTEGER '*' 'n' unary_operator INTEGER
  ;

Regarding the lexer as currently configured, why must the 'INTEGER' token 
accept a sign?  To maintain compatibility with CSS2, 'INTEGER' must omit 
the sign and relegate the sign to a separate token.

How would the lexer ever return a 'SIGNED_INTEGER' token starting with "-" 
or return an 'INTEGER' token?  The notation seems to show that 'NUMBER' 
tokens would eliminate such possibilities.  If this is only my 
misunderstanding of the Flex notation, an explanatory comment amidst the 
Flex code would help.

-- 
Etan Wexler

Received on Wednesday, 22 August 2001 17:04:39 UTC