Re: [css-syntax] Comments on Parsing and following sections from Tab Atkins Jr. on 2013-05-29 (www-style@w3.org from May 2013)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 28 May 2013 18:42:37 -0700
To: Simon Sapin <simon.sapin@exyr.org>
Cc: www-style list <www-style@w3.org>
Message-ID: <CAAWBYDCxvDBNMZA6rGL7qp1tYZqwyHhnjGPwpyGAH6HA_z_bKQ@mail.gmail.com>
On Mon, May 27, 2013 at 12:13 AM, Simon Sapin <simon.sapin@exyr.org> wrote:
> §5. Parsing
>
>     The items that can appear in the tree are a mixture of basic tokens
>     and new objects:
>
> I would remove "basic tokens" here. Some tokens are non-preserved and never
> appear in the parsed tree, and preserved tokens are already part of the
> definition of "component value".
>
>
>     at-rule
>     An at-rule has […] an optional value consisting of
>     a simple {} block.
>
> I’d like to rename this "value" to "block", so that we can refer to "the
> block of an at-rule". For example: "@import rules have no block.";
> "@font-face rules have block that contains a list of declarations."
>
>
>     preserved tokens
>     Any token produced by the tokenizer except for 〈function〉s,
>     〈{〉s, 〈(〉s, and 〈[〉s.
>
> I’d like to add a note saying that 〈}〉, 〈)〉, 〈]〉, 〈bad-string〉, and
> 〈bad-url〉 tokens as component values are always parse errors but are
> preserved by css-syntax to allow higher-level parsers such as Media Queries
> to have more fine-grained error handling than dropping a whole declaration
> or rule.

Done.

> §5.1. Parser Railroad Diagrams
>
>     Railroad diagrams are more compact than a state-machine,
>     but often easier to read than a regular expression.
>
> The parser is not a state-machine anymore, but I’m not sure what to change
> this to.

Changed to "explicit parser".

> §5.3. Parser Entry Points
>
> Of course this is only editorial. Implementations are free do use another
> strategy (such as doing tokenization and parsing in one pass) as long as the
> overall behavior is the same.

That doesn't make it editorial; it just falls under the general "if we
can't tell the difference, you're fine" exception that *all* specs
implicitly follow.

>     Dunno about "Parse a value" yet.
>     I'll remove it if I don't figure out what to do with it.
>
> It would be used by 'attr()' with a <type-or-unit> other than 'string'.

Ah, good catch.  Added some text about this.

>     "Parse a list of values" is for the contents of presentational
>     attributes, which parse text into a single declaration's value.
>
> Also selectors, MQs, and @supports conditions outside of stylesheets (e.g.
> in APIs or HTML.) But we probably don’t need an exhaustive list here, which
> would be hard to keep up-to-date.

Right.

>     "Parse a comma-separated list of values" is similar,
>     but for comma-separated lists.
>
> I’m still not convinced this is useful to have in the Syntax spec. It is
> easy to re-define on top of "Parse a list of values", and never the only
> thing you want. It’s also the same as the '#' grammar multiplier defined in
> Values.

Yeah, removed.

>     Are there any other things somewhere where some tech
>     (that isn't straight CSS itself) needs to parse some text into CSS?
>
> I can’t think of anything else that belongs in Syntax. Component values,
> declarations and rules are all that this spec defines.

That's what I was thinking.  Removed the paragraph for now; if anyone
*does* need any more algorithms to hook, I can add them later.

>     All of the algorithms defined in this spec may be called with
>     either a list of tokens or of component values.
>
> As mentioned before in this list, I think it’d be simpler to have everything
> (except "Consume a component value" itself) always work on component values
> and not tokens.

Eh, the only difference is that in a few places I accept either a {
token or a {} block, and if I switched entirely to component values
I'd only have the {} block lines.  It's not a big deal.

> Also I’m not sure that the distinction between "entry points" and
> "algorithms" brings much, and some entry points do little more that call an
> algorithms. I’d merge the two concepts.

Entry points are for other specs to hook, and do additional work
beyond the parsing that the algorithms do (or at least, *can* do
additional work - some, as you note, do nothing more than call the
algorithm and return its value).  Merging them would be awkward.

> §6.2. The <an+b> type
>
> This section needs to define (possibly by reference) how this grammar works.
> In particular:
>
> * A - character is not special like []'|? are, but part of a "symbol".
> * Unquoted symbols represent <ident> tokens whose parsed value (after
> unescaping) is an ASCII-insensitive match for the symbol.
> * Whitespace tokens are ignored (according to your emails on the subject.)
>
> Maybe just refer to the Values spec?

Referenced the Values spec.  I'd forgotten that this spec doesn't have
a "Values" section like every other spec does.

> Also, changes for the Selectors 3 definition of an+b need to be in some
> "Changes" section.

Added to the parser's Changes section.

> §7.1. Defining Block Contents: the <declaration-list>, <rule-list>, and
> <stylesheet> productions
>
>     Similarly, the <rule-list> production represents a list of rules […]
>
>     Finally, the <stylesheet> production represents a list of rules.
>     It is identical to <rule-list>, except that blocks using it default
>     to accepting all rules.
>
> I don’t see the point of having <stylesheet> in this spec. It’s really the
> same as <rule-list> since none of them really define what rules are allowed
> in a given context. And "accepting all rules" is misleading at best. For
> example, an @top-left margin rule is only allowed inside @page, not a the
> stylesheet top-level. Another spec should not have to exclude it explicitly.

I added "that aren't otherwise limited to a particular context".

The value of it is that it removes some of the necessary boilerplate
prose, and clearly distinguishes between "only this limited set of
rules" and "all but this limited set of rules".

> Also, css-conditional already has a concept of "nested statements".

I'm not sure what this has to do with this section.

>     For example, the ‘@font-face’ rule is defined to have no prelude […]
>
> I think a definition of @font-face in Syntax 3 terms would still have to be
> a bit more formal. Its prelude must either be empty or contain only
> whitespace tokens. (All at-rules have a prelude.)

Whitespace is always ignored; a prelude containing only whitespace is
identical to a prelude containing nothing, as far as the grammar is
concerned.  (I've added a sentence about how whitespace is never
indicated in the grammar explicitly.)

You're right, though, about having an empty prelude versus having no
prelude.  Changed the wording there.

> Actually, I’m not convinced that a grammar is even useful in this case. You
> can just define @font-face with prose and a reference to <declaration-list>.
> Only in some cases you might want a grammar for the prelude and/or the value
> of a rule (for example page selectors in @page.)

I find consistency useful.  Yes, @font-face is trivial enough that it
might be definable without an explicit grammar, but the grammar makes
it really clear and obvious, in a simpler manner than prose can, I
think.

>     Within a <declaration-list>, !important is automatically invalid
>     on any descriptors.
>
> If you really want to keep this statement "descriptor" needs to be defined
> in this spec, but I’d rather not have that. I think that this spec should
> only speak of declarations. Whether a given declaration is for a property or
> a descriptor or whether !important is allowed should be out of scope and
> left to the respective specs using <declaration-list> or <declaration>.

Added some text to the definition of declaration about them being
categorized as "properties" or "descriptors".

The point of this section is to reduce boilerplate, and thus reduce
errors and omissions.  Right now, it's a maxim that descriptors and
non-cascading properties *never* accept !important, because it affects
the cascade.  If we can capture that by convention such that spec
authors don't have to remember to write it, that's a win.  If we ever
come up with a reason for descriptors or non-cascading descriptors to
accept !important, we can amend this section at that time.

> The rest of this paragraph also seems out of scope. It probably belongs in
> the Cascade module.

I'm not splitting a description of how to define rule grammars across
two specs, as that would be silly.

~TJ
Received on Wednesday, 29 May 2013 01:43:24 UTC