WD-CSS21-20020802 section 4, "CSS 2.1 syntax and basic data types", substantive comments from Etan Wexler on 2002-11-14 (www-style@w3.org from November 2002)

From: Etan Wexler <ewexler@stickdog.com>
Date: Thu, 14 Nov 2002 06:40:12 -0500
To: www-style@w3.org, Bert Bos <bert@w3.org>, Tantek Çelik <tantekc@microsoft.com>, Ian Hickson <ian@hixie.ch>, Håkon Wium Lie <howcome@opera.com>
Message-Id: <v03102808b9f93b656e6e@[64.24.91.50]>
Following are substantive comments on section 4, "CSS 2.1 syntax and basic
data types" (<http://www.w3.org/TR/2002/WD-CSS21-20020802/selector.html>),
of the Cascading Style Sheets level 2.1 draft
(<http://www.w3.org/TR/2002/WD-CSS21-20020802>).



4.1.1 Tokenization


"All levels of CSS -- level 1, level 2, and any future levels -- use the
same core syntax."

Well, level 1 restricts itself to the ISO-8859-1 character repertoire.
That's a significant difference in syntax.  And then there are issues of
newline in strings and so on.


"nmstart    [_a-zA-Z]|{nonascii}|{escape}"

To return to a favorite subject, I would still like to repeal the so-called
correction that allowed unescaped underscore in identifiers.  This change,
which had some notable opposition, breaks backward compatibility with CSS1
and with any implementation of CSS2 predating the change.

Unescaped underscore, promoted on the basis that XML names may contain
underscores, are no more necessary than unescaped full stop, because XML
names may contain full stops.  We're doing fine without the latter; why do
we need the former?  Is "\_" truly an unbearable burden over and beyond "_"?

I assert that CSS2.1 can make a break from this change and can return to
the requirement of escaping underscores in identifiers.  And I'm pleading
for such a break.  We can pretend that the erratum was an April Fools prank.


"unicode   \\[0-9a-f]{1,6}[ \n\r\t\f]?"

To accomodate CRLF line breaks, change to
"unicode   \\[0-9a-f]{1,6}(\r\n|[ \n\r\t\f])?".


"COMMENT tokens do not occur in the grammar (to keep it readable)"

Readability is a canard.  Observe how painlessly COMMENT tokens are
explicitly added:

stylesheet  : [ CDO | CDC | b | statement ]+;
b           : [ S | COMMENT ]*;
statement   : ruleset | at-rule;
at-rule     : ATKEYWORD b any* [ block | ';' b ];
block       : '{' b [ any | block | ATKEYWORD b | ';' b ]* '}' b;
ruleset     : selector? '{' b declaration? [ ';' b declaration? ]* '}' b;
selector    : any+;
declaration : property ':' b value;
property    : IDENT b;
value       : [ any | block | ATKEYWORD b ]+;
any         : [ IDENT | NUMBER | PERCENTAGE | DIMENSION | STRING
              | DELIM | URI | HASH | UNICODE-RANGE | INCLUDES
              | FUNCTION any* ')' | DASHMATCH | '(' any* ')' | '[' any* ']'
] b;

I added a single short production and replaced "S*" with "b".  Is that
truly difficult to read?

In fact, adding comments explicitly lets us leave them out of certain
places in a level-specific grammar, meaning in turn that the core grammar
can be even smaller.  So if we were to have the following productions in
the CSS2.1 grammar ...

percentage    : [ '+' | '-' ]? NUMBER '%';
dimension     : [ '+' | '-' ]? NUMBER IDENT;
includes      : '~' '=';
function      : IDENT '(' any* ')';
dashmatch     : '|' '=';

... we could shorten the 'any' production as follows:

any : [ IDENT | NUMBER | STRING | URI
      | DELIM | HASH | UNICODE-RANGE
      | '(' any* ')' | '[' any* ']' ] b;




4.1.2 Keywords


"Other illegal examples:"
...
"font-family: "serif";"

But that's not illegal.  Rather, that assigns the font family named
"serif".  It's probably not what misguided authors intend, but it's not
illegal.



4.1.3 Characters and case


"In CSS 2.1, identifiers (including element names, classes, and IDs in
selectors) can contain only the characters [A-Za-z0-9] and ISO 10646
characters 161 and higher, plus the hyphen (-) and the underscore (_); they
cannot start with a hyphen or a digit."

Change to "In CSS 2.1, identifiers (including element names, classes, and
IDs in selectors) can contain, unescaped, only the characters [A-Za-z0-9]
and ISO 10646 characters 161 and higher, plus the hyphen-minus (-) and the
underscore (_); they cannot start with an unescaped hyphen-minus or an
unescaped digit [0-9]."


"They can also contain" ... "any ISO 10646 character as a numeric code"

This is false.  ISO 10646 may have characters assigned to codepoints up to
U+7FFFFFFF, whereas CSS2.1 deals in codepoints U+FFFFFF and below.


"Note that Unicode is code-by-code equivalent to ISO 10646"

This is true for now.  When ISO 10646 ventures beyond Plane 16, however,
this will cease to be true.


"If a digit or letter follows the hexadecimal number, the end of the number
needs to be made clear."

Considering "digit" as any character in Unicode general category "Nd" and
"letter" as any character in Unicode general categories "L*", this is
false.  Even restricting the terms to their meaning within the ASCII
repertoire, this is false.  The identifier \53top unambiguously corresponds
to "Stop" because "t", while a letter, is not a hexadecimal digit.  Change
the wording to "If a character in the range [0-9a-zA-Z] follows the
hexadecimal number, the end of the number needs to be made clear."


"with a space (or other whitespace character): "\26 B" ("&B"). In this
case, user agents should treat a "CR/LF" pair (13/10) as a single
whitespace character."

Change the first part to "with a space, with another whitespace character,
or with the sequence of 'Carriage Return' (13) followed by 'Line Feed'
(10):".  Eliminate the second sentence.


"Only one whitespace character is ignored after a hexadecimal escape."

This is false according to the preceding passage.  Change to "Only one
whitespace character or the sequence of 'Carriage Return' (13) followed by
'Line Feed' (10) is ignored after a hexadecimal escape."



4.1.4 Statements


"In this specification, the expressions "immediately before" or
"immediately after" mean with no intervening whitespace or comments."

The "Statements" section is an odd place to put this explanation.



4.1.8 Declarations and properties


"A declaration is either empty or consists of a property, followed by a
colon (:), followed by a value."

Add "name" after "property".


"A property is an identifier."

Add "name" after "property".

I must militate against the conflation of "property" and "property name".
The name is a CSS identifier, a series of characters.  The property is an
object attached to an element or to a pseudo-element and consists of a name
and a value.


"The second declaration on the second line contains an undefined property
'font-vendor'."

Change "contains" to "is of".



4.1.9 Comments


'Comments begin with the characters "/*" and end with the characters "*/".'

Add 'Comments may contain any characters but must not contain the sequence
"*/".'



4.2 Rules for handling parsing errors


Missing here are rules for handling entities that do not match even the
core syntax.  Should a CSS processor ignore such entities?  Should a CSS
processor accept the part of the entity before the first core error?
Answers are necessary for interoperability.

Also missing are rules for handling an at-rule where the keyword is
recognized but the following structure is not.  What, for example, should
or must a CSS1 processor do with media-specific '@import' at-rules?


"User agents must ignore a declaration with an unknown property."

Change "with" to "of".


"keywords cannot be quoted in CSS 2.1"

The addition of "2.1" is, frankly, frightening.  The implication is that,
in some future level of CSS, keywords may be quoted.  If the implication is
not desired, eliminate "2.1".  If the implication is desired, then we have
a topic deserving an entire and separate thread of discussion.



4.3.1 Integers and real numbers


'Both integers and real numbers may be preceded by a "-" or "+" to indicate
the sign.'

I have always assumed that whitespace may not intervene.  Is my assumption
correct?



4.3.2 Lengths


"After the '0' length, the unit identifier is optional."

Reading this strictly, I conclude that '0.0' is not a valid <length>.
Change to "After the '0' length or equal, the unit identifier is optional."


"Pixel units are relative to the resolution"

Change "Pixel" to "'Px'".


"the user agent should rescale pixel values"

Change "pixel" to "'px'".


"in: inches -- 1 inch is equal to 2.54 centimeters"

Does the Working Group really wish to limit precision?


"In cases where the specified length cannot be supported, user agents must
approximate it in the actual value."

Change "specified" to "computed".



4.3.5 Colors


"Values outside the device gamut should be clipped"

Add "when assigning actual values".



4.3.6 Strings


"the following two selectors are exactly the same"

Change "exactly the same" to "entirely equivalent".



4.4 CSS document representation


"An HTTP "charset" parameter in a "Content-Type" field."

What about a MIME parameter for use in mail?



4.4.1 Referring to characters not represented in a character encoding


"If most of a document requires escaping"

Change "document" to "style sheet".
Received on Thursday, 14 November 2002 07:11:41 UTC