Re: [csswg-drafts] [css-syntax-3] Consume an ident-like token algorithm differs for function tokens that start with `url` (#8280)

This notion of characters "attributed" to a token doesn't exist in the spec, and is not considered in any way while writing the spec. If you're producing some tooling that cares about that, you're going beyond what's specced, and are on your own. ^_^

That said, the reason urls and non-urls parse differently in this way is to retain the limited character lookahead (CSS only requires 3 chars of lookahead to tokenize). When consuming a url(), I don't know whether to consume it as a normal function or in the special url-token way until I see the whether the first argument is a string or not, but there can be an arbitrary amount of whitespace between the `(` and the character that tells me that. Since the *amount* of whitespace is intentionally not preserved by the parser as specced, only the *presence* of it, I can eagerly consume all but the last space, and only need to look two characters ahead to know, eventually, which branch to take. 

By doing this, if I encounter a string, then I can just return the function token immediately and there's still a single unconsumed whitespace character, which ensures that a whitespace token is correctly emitted, the same as if I'd been able to predict immediately that it would be a normal function. Exactly when the N-1 preceding whitespace characters are consumed changes between the spec and the hypothetical correct early guess, but that's not observable in the token stream as produced by the specced parser.

-- 
GitHub Notification of comment by tabatkins
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/8280#issuecomment-1370307642 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 3 January 2023 22:59:15 UTC