Re: [csswg-drafts] [css-syntax] question: about ident-like URL consumption (#5416) from Tab Atkins Jr. via GitHub on 2020-08-11 (public-css-archive@w3.org from August 2020)

From: Tab Atkins Jr. via GitHub <sysbot+gh@w3.org>
Date: Tue, 11 Aug 2020 15:57:12 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-672041799-1597161430-sysbot+gh@w3.org>

The slightly awkward algorithm ensures that, if it turns out that it needs to emit a function-token, and there was whitespace between the `(` and the `"`, it'll leave *one* character of whitespace for the tokenizer to pick up on the next pass so it can emit a whitespace token.

The tokenizer already collapses runs of adjacent whitespace into a single whitespace token, so the fact that I consumed a bunch of whitespace characters as part of producing the preceding token isn't observable. The benefit of this is that I don't need to do arbitrary lookahead from the `(` to discover if, after an arbitrary number of whitespace characters, I eventually run into a `"`; instead I only need to look two characters ahead.

(Overall, the tokenizer requires three characters of lookahead, and the parser requires one token of lookahead; keeping that minimal is good for the efficiency of implementations.)

-- 
GitHub Notification of comment by tabatkins
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/5416#issuecomment-672041799 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 11 August 2020 15:57:14 UTC