- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 19 Apr 2019 18:28:50 +0200
- To: public-i18n@w3.org
On Tue, Apr 16, 2019 at 07:17:19PM +0200, Eric Prud'hommeaux wrote:
> WebAssembly is basically a VM spec. All communication happens through
> Javascript (at least, that's all we're standardizing). Javascript
> invokes WebAssembly functions via a symbol table which maps a UTF-8
> string to an address. These strings have no interpretation beyond a
> sequence of Unicode scalar values. For instance, there's no Unicode
> Normalization, no parsing as case-foldable domain names, etc. Is there
> a state-approved way to say that?
>
> Because it's a VM, it may be called upon to manipulate e.g. human
> names, currency. In short, the subject matter may entail i18n
> requirements but that WebAssembly doesn't know anything about the
> subject matter and imposes no i18n requirements on it. My expectation
> is that it would be more confusing to mention that fact than to simply
> leave it out. Thoughts?
>
> If EcmaScript had sections for I18N and Security Considerations, I
> could just copy them. Can anyone think of something else I could copy
> from?
In case it helps, here are the answers to the Internationalization
techniques[4]:
* places where characters are used in WASM are specifically not
natural language:
+ symbol imports
+ symbol exports
+ name section (mapping from index to symbol)
* all of these allow all legal UTF-8, including U+0 (UTF-16 surrogate
pairs specfically not allowed)
The only section of I18N techniques that applies to WASM is the section
on Characters (which apply to JSAPI's use of codepoints symbols names):
* Defining a Reference Processing Model: WASM uses exact string
comparison at the codepoint level, with no normalization.
* Including and excluding character ranges: no excluded character
ranges
* Using the Private Use Area: WASM symbols may use private use areas.
* Choosing character encodings: UTF-8. In JS-API, these are
interpreted as character sequences which have equivalents in
Javascript's native string format ([5]relevant tests)
* Identifying character encodings: only one is allowed.
* Designing character escapes: the WASM text format includes escapes
necessary to be unambiguous in that grammar.
* Storing text: no text is stored except as symbols or as the WASM
text format.
* Specifying sort and search functionality: no search or sort
* Converting to a Common Unicode Form: no normalization
* Handling Case Folding: no case folding
* Defining 'string': no strings, just length-delimited codepoint
sequences (U+0 is permitted)
* Indexing strings: no strings
* Referring to Unicode characters: no references
* Referencing the Unicode Standard: follows
[6]https://www.w3.org/TR/charmod/#sec-RefUnicode
References
Visible links:
4. https://www.w3.org/International/techniques/developing-specs?collapse
5. https://github.com/WebAssembly/spec/blob/master/test/core/names.wast
6. https://www.w3.org/TR/charmod/#sec-RefUnicode
--
-eric
office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA
mobile: +1.617.599.3509
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Friday, 19 April 2019 16:28:55 UTC