[csswg-drafts] [css-nesting-1] Can we relax the syntax further? (#7961) from Lea Verou via GitHub on 2022-10-26 (public-css-archive@w3.org from October 2022)

From: Lea Verou via GitHub <sysbot+gh@w3.org>
Date: Wed, 26 Oct 2022 18:58:39 +0000
To: public-css-archive@w3.org
Message-ID: <issues.opened-1424553657-1666810717-sysbot+gh@w3.org>
LeaVerou has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-nesting-1] Can we relax the syntax further? ==
In #7834 we resolved to relax the Nesting syntax to basically "`& ` is only required to indicate a descendant selector only when the selector starts with an ident" (e.g.  in `& h1` the `&` is necessary, but `& .foo` can become just `.foo` and `& > h1` can be just `> h1`.

I opened this issue to brainstorm about ways to relax the syntax even further and do away with the `&` for all descendants, without introducing infinite lookahead, so we can have our 🍰 and eat it too.

## The problem

If we do not require a `&` before descendant element selectors, then when followed by a pseudo-class they can look like declarations (which are `<ident>` `:` and then anything goes, including another `<ident>`) to the parser. The parser cannot know if it's dealing with a declaration or a nested rule, until it sees a `;` or `{`, which could come after an arbitrary number of tokens, hence unbounded lookahead.

## Non-starters

-  No, we cannot require whitespace after `:` for declarations, minifiers currently remove that so there is a lot of code out there with declarations that do not include whitespace after `:`.
- No, the parser cannot take the list of recognized properties or pseudo-classes into account when deciding whether it's dealing with a declaration or a rule. That would be a most unfortunate coupling, and wouldn't even completely solve the problem (e.g. `font` is both a valid property name and an HTML element).

## Brainstorming

One way this problem is bounded is that there are only **two** distinct possibilities: either you have a declaration, or a selector, and both of these involve finite lookahead. 

In CSS, tokenization is context-less, i.e. parsing a declaration or rule creates the same tokens, it's only the higher-level structures that are different. 

Assuming parsing a declaration takes O(M) time and parsing a rule takes O(N) time, it would theoretically solve the problem to naively parse every rule-or-declaration twice (one as declaration, one as rule), and then throw away the structure we don't need. Clearly, that's a silly idea, because that would take O(M+N) time for every rule-or-declaration.

One optimization would be to parse as a declaration (there are far more declarations than nested rules), and keep the list of raw tokens around until the `;` or `{`. Then declarations continue to be parsed in O(M) time, and rules are parsed in O(M+N) time. The extra space needed is minimal, since we don't need to keep these tokens around after the current structure is parsed. 

But also, as discussed in #7834, we can rule out the possibility of being a declaration very early for nearly every selector. The **only** exception is element selectors followed by a pseudo-class (e.g. `strong:hover`) which are fairly rare in nested stylesheets (you usually want to style the base selector as well, so it's usually `& { /* ... */ &:hover {...} }`)

So in the end, declarations still take O(M) time, nearly all rules still take O(N) time, and *some, but very few* rules take O(M + N) time. 
And there's probably more room for optimizations.
I'd love to hear from implementers whether this is feasible, and whether I'm missing something. 

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/7961 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Wednesday, 26 October 2022 18:58:41 UTC