Re: agenda+ Fwd: Re: language for unicode string [I18N-ACTION-800]

*From: *Phillips, Addison <addison@lab126.com>
*Date: *Mon, May 13, 2019 at 5:38 PM
*To: *Eric Prud'hommeaux
*Cc: *public-i18n-core@w3.org, binji@google.com

> >
> > > Supplementary characters (that is, those beyond the BMP) are not an
> issue.
> > However, isolated (that is, *unpaired*) surrogate code units are
> permitted in
> > JavaScript strings. The question is how to deal with them (not allowing
> them
> > would be fine by me--for security they are often replaced by U+FFFD). So
> > the question is whether you're permitted to have a string like "\uD800
> > ABCDEFG \uD800\uDC00\uD800" (which starts and ends with an unpaired
> > surrogate, but has a valid surrogate pair in the middle).
> >
> > I'd say that
> > [[
> > Names are sequences of characters, which are scalar values as defined by
> > Unicode (Section 2.4).
> > ]]
> > says no, but I can't lay my hands on tests to make sure implementations
> barf
> > on it. (Part of the problem is that WASM tests input conditions are
> > synthesized in a browser so it may be difficult to create such a string
> on some
> > platforms.)
> >
>
> If they are synthesized using JavaScript, it should be as simple as:
>
>      String.fromCharCode(0xd800, 0xd800, 0xd00); // three isolated
> surrogates
>

Hi Addison,

The WebAssembly binary format encodes all strings as utf-8, so surrogates
are an error. See the definition of name here
<http://webassembly.github.io/spec/core/binary/values.html#names>. This
definition is used when specifying the import names
<http://webassembly.github.io/spec/core/binary/modules.html#binary-importsec>,
export names
<http://webassembly.github.io/spec/core/binary/modules.html#export-section>,
custom section names
<http://webassembly.github.io/spec/core/binary/modules.html#binary-customsec>,
and names found in the "name" custom section
<http://webassembly.github.io/spec/core/appendix/custom.html#name-section>.

The WebAssembly JS API returns strings read from the binary format in the
ModuleExportDescriptor
<http://webassembly.github.io/spec/js-api/index.html#dictdef-moduleexportdescriptor>
and
ModuleImportDescriptor
<http://webassembly.github.io/spec/js-api/index.html#dictdef-moduleimportdescriptor>
dictionaries.
Those are represented as USVStrings.

There is one function that uses DOMString, WebAssembly.Module.customSections
<http://webassembly.github.io/spec/js-api/index.html#dom-module-customsections>.
This will decode the custom section names (which are required to be utf-8)
and compare to the DOMString value directly.


>
> Addison
>

Received on Tuesday, 14 May 2019 20:08:22 UTC