Re: [whatwg/webidl] “Unicode character” should likely say “Unicode scalar value” in intro to lexical grammar (Issue #1080) from Domenic Denicola on 2022-01-03 (public-webapps-github@w3.org from January 2022)

From: Domenic Denicola <notifications@github.com>
Date: Mon, 03 Jan 2022 10:58:43 -0800
To: whatwg/webidl <webidl@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/webidl/issues/1080/1004290474@github.com>

> It seems this should probably say USVs. This can be inferred sorta because the string literal interpretation algorithm appears to assume that source text consumed by string is already known to be exclusively USVs.

Great find. My first instinct was that it doesn't really matter since all Web IDL constructs are ASCII. But I guess this part of the spec does indeed assume USVs.

Note that unlike, say, CSS or HTML, Web IDL doesn't really have an "entry point" for parsing, and certainly not one that's web-observable. So the choice here is really a statement about valid Web IDL files, I guess? I.e. we're saying that if some Web IDL-consuming software gets a sequence of bytes which, when decoded*, contains unpaired surrogates, then the result of that software should be a parse error.

\* "Decoded": not necessarily UTF-8, as we don't (and IMO shouldn't) state anywhere that `.webidl` files must be UTF-8 encoded!

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/webidl/issues/1080#issuecomment-1004290474
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/webidl/issues/1080/1004290474@github.com>

Received on Monday, 3 January 2022 18:58:55 UTC