Re: [csswg-drafts] [css-syntax] The tokenizer input should probably be a stream of scalar values, not codepoints (#3307)

Chatted with @bfgeek today and traced our code a bit. The objection to USVString in CSSOM is separate from the acceptance of this filtering in the CSS parser, so we're fine with the spec change here in Syntax, but not to simplifying CSSOMString in CSSOM.

Basically, IDL's requirements for USVString are strict, and require codepoint filtering immediately upon the value entering the system. Switching the OM to USVString would mean a *separate* pass over each string, to replace surrogates, then the parser would run over the string and do its own substitutions as it worked.

On the other hand, just adding surrogates to the list of filtered codepoints can be done at the same time as the other filterings, on-demand during parsing, so no additional passes over the strings (just a bit more comparisons).

So yes, while the *author-observable effects* of parser-filtering or just switching to USVString are identical, the internal effects are different unless we do some custom contortions of our IDL code, which we'd rather not do, at least not right now.

-- 
GitHub Notification of comment by tabatkins
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/3307#issuecomment-457332610 using your GitHub account

Received on Thursday, 24 January 2019 19:50:33 UTC