- From: Ben Smith <binji@google.com>
- Date: Tue, 14 May 2019 13:05:37 -0700
- To: "Phillips, Addison" <addison@lab126.com>
- Cc: "Eric Prud'hommeaux" <eric@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
- Message-ID: <CA+M=bSPo5g6hRadaoOEKUyj_42TCOoT3cCxDuBiQ4zc3EK3_Sw@mail.gmail.com>
*From: *Phillips, Addison <addison@lab126.com> *Date: *Mon, May 13, 2019 at 5:38 PM *To: *Eric Prud'hommeaux *Cc: *public-i18n-core@w3.org, binji@google.com > > > > > Supplementary characters (that is, those beyond the BMP) are not an > issue. > > However, isolated (that is, *unpaired*) surrogate code units are > permitted in > > JavaScript strings. The question is how to deal with them (not allowing > them > > would be fine by me--for security they are often replaced by U+FFFD). So > > the question is whether you're permitted to have a string like "\uD800 > > ABCDEFG \uD800\uDC00\uD800" (which starts and ends with an unpaired > > surrogate, but has a valid surrogate pair in the middle). > > > > I'd say that > > [[ > > Names are sequences of characters, which are scalar values as defined by > > Unicode (Section 2.4). > > ]] > > says no, but I can't lay my hands on tests to make sure implementations > barf > > on it. (Part of the problem is that WASM tests input conditions are > > synthesized in a browser so it may be difficult to create such a string > on some > > platforms.) > > > > If they are synthesized using JavaScript, it should be as simple as: > > String.fromCharCode(0xd800, 0xd800, 0xd00); // three isolated > surrogates > Hi Addison, The WebAssembly binary format encodes all strings as utf-8, so surrogates are an error. See the definition of name here <http://webassembly.github.io/spec/core/binary/values.html#names>. This definition is used when specifying the import names <http://webassembly.github.io/spec/core/binary/modules.html#binary-importsec>, export names <http://webassembly.github.io/spec/core/binary/modules.html#export-section>, custom section names <http://webassembly.github.io/spec/core/binary/modules.html#binary-customsec>, and names found in the "name" custom section <http://webassembly.github.io/spec/core/appendix/custom.html#name-section>. The WebAssembly JS API returns strings read from the binary format in the ModuleExportDescriptor <http://webassembly.github.io/spec/js-api/index.html#dictdef-moduleexportdescriptor> and ModuleImportDescriptor <http://webassembly.github.io/spec/js-api/index.html#dictdef-moduleimportdescriptor> dictionaries. Those are represented as USVStrings. There is one function that uses DOMString, WebAssembly.Module.customSections <http://webassembly.github.io/spec/js-api/index.html#dom-module-customsections>. This will decode the custom section names (which are required to be utf-8) and compare to the DOMString value directly. > > Addison >
Received on Tuesday, 14 May 2019 20:08:22 UTC