Re: [csswg-drafts] [tokenization-css-question] Questions about tokenization of CSS (#6944)

That sentence translates directly into code in C-like languages:

```js
while(isWhitespace(input.get(0)) && isWhitespace(input.get(1))) {
  input.consumeNext();
}
```

If you had code like `url(             "http://example.com")`, that's a valid url function and we need to match it. So I need to consume an arbitrary amount of whitespace between the `(` and the start of the url itself.

I don't use "consume as much whitespace as possible" because the parser always retains the *existence* of whitespace; in the above example it will output `FUNCTION-TOKEN("url") WS STRING-TOKEN("http://example.com") CLOSE-PAREN. 

I can't just preemptively consume all the whitespace and immediately output a WS token, either, because the "consume a token" algo only ever emits a single token at a time. So I have to leave at least one whitespace character behind, so the *next* call to the algorithm can find it and emit a WS token.

> However, if I understand it correctly, point 2 in 4.3.6 seems redundant

It's not. As I said above, I purposely leave behind one space, but if I'm just emitting a url token, I don't need it. So I go ahead and consume it.  Technically I could just have it check if the next character is whitespace and consume it, but "consume as much whitespace as possible" is clearer in its intent and prevents any bugs if I somehow call into this algorithm with more than one space character left behind. I never want a URL to actually start with whitespace.

-- 
GitHub Notification of comment by tabatkins
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/6944#issuecomment-1012478636 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 13 January 2022 20:10:42 UTC