[csswg-drafts] [css-syntax] The tokenizer input should probably be a stream of scalar values, not codepoints

tabatkins has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-syntax] The tokenizer input should probably be a stream of scalar values, not codepoints ==
In <https://github.com/WICG/construct-stylesheets/pull/61#discussion_r232381468>, Boris points out that the Syntax spec specifies that the input to the tokenizer is a stream of code points, and wonders if I actually mean scalar values there. (That is, all codepoints except surrogates.)

I think at the time I wrote this, USVString didn't yet exist, and the distinction between the two wasn't really present in specs. But if I were writing it today, I'm pretty sure I'd use scalar values.

In particular, note that [you can't produce a surrogate code point from an escape](https://drafts.csswg.org/css-syntax/#consume-escaped-code-point), which suggests that I assumed *no* surrogates would show up in the stream.

So, I think I should switch the spec over to referring to scalar values, and have a conversion step for going from codepoints to scalars (probably just converting non-scalars to U+FFFD? I'll look at impls and see what's up).

[Test case](http://software.hixie.ch/utilities/js/live-dom-viewer/saved/6360):
```html
<!DOCTYPE html>
<style></style>
<script>
document.querySelector("style").textContent = ".fo\ud800o { color: blue; }";
w([...document.styleSheets[0].cssRules[0].selectorText].map(x=>x.codePointAt(0).toString(16)));
</script>
```

Looks like Chrome retains the character as U+D800, while Firefox censors it to U+FFFD. Perhaps this is just related to which definition each uses for CSSOMString?

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/3307 using your GitHub account

Received on Friday, 9 November 2018 20:44:59 UTC