- From: Bert Bos <bert@w3.org>
- Date: Thu, 7 May 2009 21:14:03 +0200
- To: www-style@w3.org
(Hello Andrey, Your tone doesn't really inspire a response, but I know that exclamation marks often mask a lack of knowledge of a foreign language.) It's quite possible that we've made mistakes, that's why we're asking for comments. People raised issues on the previous version and we tried to solve them. There is no intention to change the syntax. There were people who asked for more rules for handling non-CSS input and although it is almost impossible to give rules without dictating a particular parsing algorithm, we gave it a try anyway. Those rules are new, but they only deal with input that isn't CSS, will never be CSS and for which there previously were no rules. If we inadvertently changed something in how *valid* CSS is handled, that's a bug. The same is true, mutatis mutandis, for bolder/lighter. Maybe it looked like it was well-defined before, but in reality it wasn't. At least we received issues on it. It's possible that there are better ways to solve those issues... On Tuesday 05 May 2009, Andrey Mikhalev wrote: > 1. Appendix G. Grammar of CSS 2.1, > G.2 Lexical scanner, > following production was removed: > {s}+\/\*[^*]*\*+([^/*][^*]*\*+)*\/ {unput(' '); /*replace by > space*/} > > production essential for selector parsing: without it selectors > like 'A /**/>B' > became invalid (token sequence is "ident,s,greater,ident" instead > of "ident,greater,ident") This indeed looks like a mistake. But the unput() wasn't correct either, because it allowed 'A/**/B' without any combinator. The change was made in response to an issue[1] that was raised on this mailing list. When we discussed it, we noticed that the grammar in appendix G and in the Selectors module were different. We thought the latter looked better and copied it. It seems now that the Selectors module wasn't correct either. Here is my new attempt, in the form of a "unified diff" (i.e., lines that start with "-" are to be removed, lines with "+" are added, and lines with a space are unchanged). There were also errors in the resolution of issue 104[2]. The first of the changes is meant to fix that. (Yves probably won't like this grammar, because it is again not LL(1), although I believe it is LALR(1).) ---------------------------------------------------------------------- stylesheet : [ CHARSET_SYM STRING ';' ]? - [S|CDO|CDC]* [ import [ [CDO|CDC] [S|CDO|CDC] ]* ]* - [ [ ruleset | media | page ] [ [CDO|CDC] [S|CDO|CDC] ]* ]* + [S|CDO|CDC]* [ import [ CDO S* | CDC S* ]* ]* + [ [ ruleset | media | page ] [ CDO S* | CDC S* ]* ]* ; import : IMPORT_SYM S* - [STRING|URI] S* [ medium [ COMMA S* medium]* ]? ';' S* + [STRING|URI] S* [ medium [ ',' S* medium]* ]? ';' S* ; media - : MEDIA_SYM S* medium [ COMMA S* medium ]* LBRACE S* ruleset* '}' S* + : MEDIA_SYM S* medium [ ',' S* medium ]* '{' S* ruleset* '}' S* ; medium : IDENT S* ; page : PAGE_SYM S* pseudo_page? - LBRACE S* declaration? [ ';' S* declaration? ]* '}' S* + '{' S* declaration? [ ';' S* declaration? ]* '}' S* ; pseudo_page : ':' IDENT S* ; operator - : '/' S* | COMMA S* + : '/' S* | ',' S* ; combinator - : PLUS S* - | GREATER S* + : S* '+' S* + | S* '>' S* | S+ ; unary_operator - : '-' | PLUS + : '-' | '+' ; property : IDENT S* ; ruleset - : selector [ COMMA S* selector ]* - LBRACE S* declaration? [ ';' S* declaration? ]* '}' S* + : selector [ ',' S* selector ]* + '{' S* declaration? [ ';' S* declaration? ]* '}' S* ; selector - : simple_selector [ combinator simple_selector ]* + : simple_selector [ combinator simple_selector ]* S* ; simple_selector : element_name [ HASH | class | attrib | pseudo ]* | [ HASH | class | attrib | pseudo ]+ ; class : '.' IDENT ; element_name : IDENT | '*' ; attrib : '[' S* IDENT S* [ [ '=' | INCLUDES | DASHMATCH ] S* [ IDENT | STRING ] S* ]? ']' ; pseudo : ':' [ IDENT | FUNCTION S* [IDENT S*]? ')' ] ; declaration : property ':' S* expr prio? ; prio : IMPORTANT_SYM S* ; expr : term [ operator? term ]* ; term : unary_operator? [ NUMBER S* | PERCENTAGE S* | LENGTH S* | EMS S* | EXS S* | ANGLE S* | TIME S* | FREQ S* ] | STRING S* | IDENT S* | URI S* | hexcolor | function ; function : FUNCTION S* expr ')' S* ; /* * There is a constraint on the color that it must * have either 3 or 6 hex-digits (i.e., [0-9a-fA-F]) * after the "#"; e.g., "#000" is OK, but "#abcd" is not. */ hexcolor : HASH S* ; ---------------------------------------------------------------------- And some lines can be removed from the tokenizer: ---------------------------------------------------------------------- %option case-insensitive h [0-9a-f] nonascii [\200-\377] unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])? escape {unicode}|\\[^\r\n\f0-9a-f] nmstart [_a-z]|{nonascii}|{escape} nmchar [_a-z0-9-]|{nonascii}|{escape} string1 \"([^\n\r\f\\"]|\\{nl}|{escape})*\" string2 \'([^\n\r\f\\']|\\{nl}|{escape})*\' invalid1 \"([^\n\r\f\\"]|\\{nl}|{escape})* invalid2 \'([^\n\r\f\\']|\\{nl}|{escape})* comment \/\*[^*]*\*+([^/*][^*]*\*+)*\/ ident -?{nmstart}{nmchar}* name {nmchar}+ num [0-9]+|[0-9]*"."[0-9]+ string {string1}|{string2} invalid {invalid1}|{invalid2} url ([!#$%&*-~]|{nonascii}|{escape})* s [ \t\r\n\f]+ w {s}? nl \n|\r\n|\r|\f A a|\\0{0,4}(41|61)(\r\n|[ \t\r\n\f])? C c|\\0{0,4}(43|63)(\r\n|[ \t\r\n\f])? D d|\\0{0,4}(44|64)(\r\n|[ \t\r\n\f])? E e|\\0{0,4}(45|65)(\r\n|[ \t\r\n\f])? G g|\\0{0,4}(47|67)(\r\n|[ \t\r\n\f])?|\\g H h|\\0{0,4}(48|68)(\r\n|[ \t\r\n\f])?|\\h I i|\\0{0,4}(49|69)(\r\n|[ \t\r\n\f])?|\\i K k|\\0{0,4}(4b|6b)(\r\n|[ \t\r\n\f])?|\\k L l|\\0{0,4}(4c|6c)(\r\n|[ \t\r\n\f])?|\\l M m|\\0{0,4}(4d|6d)(\r\n|[ \t\r\n\f])?|\\m N n|\\0{0,4}(4e|6e)(\r\n|[ \t\r\n\f])?|\\n O o|\\0{0,4}(4f|6f)(\r\n|[ \t\r\n\f])?|\\o P p|\\0{0,4}(50|70)(\r\n|[ \t\r\n\f])?|\\p R r|\\0{0,4}(52|72)(\r\n|[ \t\r\n\f])?|\\r S s|\\0{0,4}(53|73)(\r\n|[ \t\r\n\f])?|\\s T t|\\0{0,4}(54|74)(\r\n|[ \t\r\n\f])?|\\t U u|\\0{0,4}(55|75)(\r\n|[ \t\r\n\f])?|\\u X x|\\0{0,4}(58|78)(\r\n|[ \t\r\n\f])?|\\x Z z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z %% {s} {return S;} \/\*[^*]*\*+([^/*][^*]*\*+)*\/ /* ignore comments */ "<!--" {return CDO;} "-->" {return CDC;} "~=" {return INCLUDES;} "|=" {return DASHMATCH;} -{w}"{" {return LBRACE;} -{w}"+" {return PLUS;} -{w}">" {return GREATER;} -{w}"," {return COMMA;} - {string} {return STRING;} {invalid} {return INVALID; /* unclosed string */} {ident} {return IDENT;} "#"{name} {return HASH;} @{I}{M}{P}{O}{R}{T} {return IMPORT_SYM;} @{P}{A}{G}{E} {return PAGE_SYM;} @{M}{E}{D}{I}{A} {return MEDIA_SYM;} "@charset " {return CHARSET_SYM;} "!"({w}|{comment})*{I}{M}{P}{O}{R}{T}{A}{N}{T} {return IMPORTANT_SYM;} {num}{E}{M} {return EMS;} {num}{E}{X} {return EXS;} {num}{P}{X} {return LENGTH;} {num}{C}{M} {return LENGTH;} {num}{M}{M} {return LENGTH;} {num}{I}{N} {return LENGTH;} {num}{P}{T} {return LENGTH;} {num}{P}{C} {return LENGTH;} {num}{D}{E}{G} {return ANGLE;} {num}{R}{A}{D} {return ANGLE;} {num}{G}{R}{A}{D} {return ANGLE;} {num}{M}{S} {return TIME;} {num}{S} {return TIME;} {num}{H}{Z} {return FREQ;} {num}{K}{H}{Z} {return FREQ;} {num}{C}{M} {return LENGTH;} {num}{M}{M} {return LENGTH;} {num}{I}{N} {return LENGTH;} {num}{P}{T} {return LENGTH;} {num}{P}{C} {return LENGTH;} {num}{D}{E}{G} {return ANGLE;} {num}{R}{A}{D} {return ANGLE;} {num}{G}{R}{A}{D} {return ANGLE;} {num}{M}{S} {return TIME;} {num}{S} {return TIME;} {num}{H}{Z} {return FREQ;} {num}{K}{H}{Z} {return FREQ;} {num}{ident} {return DIMENSION;} {num}% {return PERCENTAGE;} {num} {return NUMBER;} {U}{R}{L}"("{w}{string}{w}")" {return URI;} {U}{R}{L}"("{w}{url}{w}")" {return URI;} {ident}"(" {return FUNCTION;} . {return *yytext;} ---------------------------------------------------------------------- [1] http://wiki.csswg.org/spec/css2.1#issue-5 [2] http://wiki.csswg.org/spec/css2.1#issue-104 > > 2. 4 Syntax and basic data types, > 4.2 Rules for handling parsing errors, > Invalid at-keywords: > User agents must ignore an invalid at-keyword together with > everything following it, up to and including ... > following sentence added: > the end of the block (}) that contains the invalid at-keyword > > what you are talking about? _ignore_ _end of the block_?!! > > @media x { /*...*/ @invalid } /*... style here belongs to what?*/ I think we meant "up to the end of the block (}) that contains the invalid at-keyword" rather than "up to and including." The intention was precisely to make it clear that the "}" should *not* be ignored. Maybe change from User agents must ignore an invalid at-keyword together with everything following it, up to and including the next semicolon (;), the next block ({...}), or the end of the block (}) that contains the invalid at-keyword, whichever comes first. to User agents must ignore an invalid at-keyword together with everything following it, up to the end of the block that contains the invalid at-keyword, or up to and including the next semicolon (;) or up to and including the next block ({...}), whichever comes first. Or, more verbosely: User agents must ignore an invalid at-keyword together with everything following it, up to and including the next semicolon (;) or the next block ({...}), whichever comes first. If the invalid at-keyword occurs inside a block, and there is no semicolon or block between the at-keyword and the end of that block, then everything from the at-keyword up to the end of the block is ignored. > > 3. 4 Syntax and basic data types, > 4.2 Rules for handling parsing errors, > following paragraph added: > Malformed statements. > User agents must handle unexpected tokens encountered while > parsing a statement by reading until the end of the statement, while > observing the rules for matching pairs of (), [], {}, "", and '', and > correctly handling escapes. > ... > > most evil idea, violate nearly everything in chapter 4, starting > from formal core syntax. > in short: > 'unexpected token' in 'statement' cannot occur - since > 'statement' is not a checkpoint (not a Real Thing, precisely). > handling of parsing errors differs for selectors / declarations > / at-rules. > paragraph above redundant and introduce conflict with them. The intention is that the rule for malformed declarations takes precedence over that for malformed statements, as it comes first in the spec. Thus an unexpected token in a declaration causes just the declaration to be ignored, not the whole statement. The new rule about malformed statements is a generalization of that in 4.1.7 about errors in selectors: not only an error in a selector causes a statement to be ignored, but also an error that occurs after an at-keyword, e.g: @media @error {...} In fact, although it may not be very clear from the text (which is kept as short as possible), but hopefully from the examples, if an unexpected token occurs anywhere where a statement *could* occur, then that token is ignored together with the next statement. E.g., the whole 1st line is ignored in this: } h2 {color: orange} h1 {color: green} > 4. 15 Fonts, > 15.6 Font boldness : the 'font-weight' property: > 'bolder' selects the next weight that is assigned to a font > that is darker than the inherited one. > following sentence removed: > If there is no such weight, it simply results in the next > darker numerical value (and the font remains unchanged), unless the > inherited value was '900' in which case the resulting weight is also > '900'. > [similar in 'lighter'] > following paragraph added: > Note: A set of nested elements that mix 'bolder' and 'lighter' > will give unpredictable results depending on the UA, OS, and font > availability. This behavior will be more precisely defined in CSS3. > > - changing _defined_ behaviour to _undefined_ is not an > improvement. - css3 reference nonsence. > (imo: if someone tries to turn css2 specification into > 'css3 todo list' - shoot, don't talk) > - the weight metric is independent from font[family]. > as value of independent metric, 'bolder' SHOULD result next > numerical value. > futher - as a hack for non-perfect world - it MAY (or MAY NOT) > yield to next available font's weight. > what was unclean here? why you killing primary objectives of > property/value, leaving only hack description? Imagine four nested elements, from outside to inside they have font-weight: normal font-weight: bolder font-weight: bolder font-weight: lighter The old spec said the computed value of the innermost is "one of the legal number values combined with one or more of the relative values (bolder or lighter)." But does that mean 400 + bolder + bolder + lighter or 400 + bolder or 400 + 1 * lighter + 2 * bolder? That makes a difference. Assume a font with weights 400 (normal) and 900 (extra bold). A UA that does the first will end up at 400, while a UA that does the second will choose 900. The text about taking the next available weight or the next numerical value if there is no next weight available dates from the old CSS2 REC, and assumed that the computed value was a number. But it didn't define what happened for elements with more than one font family, so it's likely that taking the next numerical value isn't actually a good idea. Maybe we will find a good solution before we progress CSS 2.1 to Recommendation. That's currently issue 111[3]. But maybe we won't and leave the algorithm undefined in CSS 2.1. [3] http://wiki.csswg.org/spec/css2.1#issue-111 Bert -- Bert Bos ( W 3 C ) http://www.w3.org/ http://www.w3.org/people/bos W3C/ERCIM bert@w3.org 2004 Rt des Lucioles / BP 93 +33 (0)4 92 38 76 92 06902 Sophia Antipolis Cedex, France
Received on Thursday, 7 May 2009 19:14:43 UTC