[css-syntax] Comments on Parsing and following sections from Simon Sapin on 2013-05-27 (www-style@w3.org from May 2013)

From: Simon Sapin <simon.sapin@exyr.org>
Date: Mon, 27 May 2013 15:13:47 +0800
To: www-style list <www-style@w3.org>
Message-ID: <51A307AB.6060908@exyr.org>
§5. Parsing

     The items that can appear in the tree are a mixture of basic tokens
     and new objects:

I would remove "basic tokens" here. Some tokens are non-preserved and 
never appear in the parsed tree, and preserved tokens are already part 
of the definition of "component value".


     at-rule
     An at-rule has […] an optional value consisting of
     a simple {} block.

I’d like to rename this "value" to "block", so that we can refer to "the 
block of an at-rule". For example: "@import rules have no block."; 
"@font-face rules have block that contains a list of declarations."


     preserved tokens
     Any token produced by the tokenizer except for 〈function〉s,
     〈{〉s, 〈(〉s, and 〈[〉s.

I’d like to add a note saying that 〈}〉, 〈)〉, 〈]〉, 〈bad-string〉, and 
〈bad-url〉 tokens as component values are always parse errors but are 
preserved by css-syntax to allow higher-level parsers such as Media 
Queries to have more fine-grained error handling than dropping a whole 
declaration or rule.


§5.1. Parser Railroad Diagrams

     Railroad diagrams are more compact than a state-machine,
     but often easier to read than a regular expression.

The parser is not a state-machine anymore, but I’m not sure what to 
change this to.


§5.3. Parser Entry Points

Of course this is only editorial. Implementations are free do use 
another strategy (such as doing tokenization and parsing in one pass) as 
long as the overall behavior is the same.


     Dunno about "Parse a value" yet.
     I'll remove it if I don't figure out what to do with it.

It would be used by 'attr()' with a <type-or-unit> other than 'string'.


     "Parse a list of values" is for the contents of presentational
     attributes, which parse text into a single declaration's value.

Also selectors, MQs, and @supports conditions outside of stylesheets 
(e.g. in APIs or HTML.) But we probably don’t need an exhaustive list 
here, which would be hard to keep up-to-date.


     "Parse a comma-separated list of values" is similar,
     but for comma-separated lists.

I’m still not convinced this is useful to have in the Syntax spec. It is 
easy to re-define on top of "Parse a list of values", and never the only 
thing you want. It’s also the same as the '#' grammar multiplier defined 
in Values.


     Are there any other things somewhere where some tech
     (that isn't straight CSS itself) needs to parse some text into CSS?

I can’t think of anything else that belongs in Syntax. Component values, 
declarations and rules are all that this spec defines.


     All of the algorithms defined in this spec may be called with
     either a list of tokens or of component values.

As mentioned before in this list, I think it’d be simpler to have 
everything (except "Consume a component value" itself) always work on 
component values and not tokens.

Also I’m not sure that the distinction between "entry points" and 
"algorithms" brings much, and some entry points do little more that call 
an algorithms. I’d merge the two concepts.


§6.2. The <an+b> type

This section needs to define (possibly by reference) how this grammar 
works. In particular:

* A - character is not special like []'|? are, but part of a "symbol".
* Unquoted symbols represent <ident> tokens whose parsed value (after 
unescaping) is an ASCII-insensitive match for the symbol.
* Whitespace tokens are ignored (according to your emails on the subject.)

Maybe just refer to the Values spec?

Also, changes for the Selectors 3 definition of an+b need to be in some 
"Changes" section.


§7.1. Defining Block Contents: the <declaration-list>, <rule-list>, and 
<stylesheet> productions

     Similarly, the <rule-list> production represents a list of rules […]

     Finally, the <stylesheet> production represents a list of rules.
     It is identical to <rule-list>, except that blocks using it default
     to accepting all rules.

I don’t see the point of having <stylesheet> in this spec. It’s really 
the same as <rule-list> since none of them really define what rules are 
allowed in a given context. And "accepting all rules" is misleading at 
best. For example, an @top-left margin rule is only allowed inside 
@page, not a the stylesheet top-level. Another spec should not have to 
exclude it explicitly.

Also, css-conditional already has a concept of "nested statements".


     For example, the ‘@font-face’ rule is defined to have no prelude […]

I think a definition of @font-face in Syntax 3 terms would still have to 
be a bit more formal. Its prelude must either be empty or contain only 
whitespace tokens. (All at-rules have a prelude.)

Actually, I’m not convinced that a grammar is even useful in this case. 
You can just define @font-face with prose and a reference to 
<declaration-list>. Only in some cases you might want a grammar for the 
prelude and/or the value of a rule (for example page selectors in @page.)


     Within a <declaration-list>, !important is automatically invalid
     on any descriptors.

If you really want to keep this statement "descriptor" needs to be 
defined in this spec, but I’d rather not have that. I think that this 
spec should only speak of declarations. Whether a given declaration is 
for a property or a descriptor or whether !important is allowed should 
be out of scope and left to the respective specs using 
<declaration-list> or <declaration>.

The rest of this paragraph also seems out of scope. It probably belongs 
in the Cascade module.

-- 
Simon Sapin
Received on Monday, 27 May 2013 07:14:19 UTC