- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 19 Apr 2019 18:28:50 +0200
- To: public-i18n@w3.org
On Tue, Apr 16, 2019 at 07:17:19PM +0200, Eric Prud'hommeaux wrote: > WebAssembly is basically a VM spec. All communication happens through > Javascript (at least, that's all we're standardizing). Javascript > invokes WebAssembly functions via a symbol table which maps a UTF-8 > string to an address. These strings have no interpretation beyond a > sequence of Unicode scalar values. For instance, there's no Unicode > Normalization, no parsing as case-foldable domain names, etc. Is there > a state-approved way to say that? > > Because it's a VM, it may be called upon to manipulate e.g. human > names, currency. In short, the subject matter may entail i18n > requirements but that WebAssembly doesn't know anything about the > subject matter and imposes no i18n requirements on it. My expectation > is that it would be more confusing to mention that fact than to simply > leave it out. Thoughts? > > If EcmaScript had sections for I18N and Security Considerations, I > could just copy them. Can anyone think of something else I could copy > from? In case it helps, here are the answers to the Internationalization techniques[4]: * places where characters are used in WASM are specifically not natural language: + symbol imports + symbol exports + name section (mapping from index to symbol) * all of these allow all legal UTF-8, including U+0 (UTF-16 surrogate pairs specfically not allowed) The only section of I18N techniques that applies to WASM is the section on Characters (which apply to JSAPI's use of codepoints symbols names): * Defining a Reference Processing Model: WASM uses exact string comparison at the codepoint level, with no normalization. * Including and excluding character ranges: no excluded character ranges * Using the Private Use Area: WASM symbols may use private use areas. * Choosing character encodings: UTF-8. In JS-API, these are interpreted as character sequences which have equivalents in Javascript's native string format ([5]relevant tests) * Identifying character encodings: only one is allowed. * Designing character escapes: the WASM text format includes escapes necessary to be unambiguous in that grammar. * Storing text: no text is stored except as symbols or as the WASM text format. * Specifying sort and search functionality: no search or sort * Converting to a Common Unicode Form: no normalization * Handling Case Folding: no case folding * Defining 'string': no strings, just length-delimited codepoint sequences (U+0 is permitted) * Indexing strings: no strings * Referring to Unicode characters: no references * Referencing the Unicode Standard: follows [6]https://www.w3.org/TR/charmod/#sec-RefUnicode References Visible links: 4. https://www.w3.org/International/techniques/developing-specs?collapse 5. https://github.com/WebAssembly/spec/blob/master/test/core/names.wast 6. https://www.w3.org/TR/charmod/#sec-RefUnicode -- -eric office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA mobile: +1.617.599.3509 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Friday, 19 April 2019 16:28:55 UTC