- From: Phillips, Addison <addison@lab126.com>
- Date: Mon, 13 May 2019 23:51:01 +0000
- To: "Eric Prud'hommeaux" <eric@w3.org>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "binji@google.com" <binji@google.com>
> > > > Good. You might review [3]. > > I'd always assumed that was for exotic operations like `lowercase()`, but I see > a term called Default Normalization Step <https://www.w3.org/TR/charmod- > norm/#DefaultNormalizationStep> which I read as "do nothing". I assume it > will cause more confusion to mention this than to elide it. Reasonable? That's reasonable and what I would do. The point in charmod-norm is to have you positively decide not to normalize. > > > > > * Choosing character encodings: UTF-8. In JS-API, these are > > > interpreted as character sequences which have equivalents in > > > Javascript's native string format ([5]relevant tests) > > > > Do you have a specific pointer. The "hot spot" in here is that Javascript's > definition [4] of String is still effectively "UCS-2 friendly". That is, it allows > unpaired surrogate code points. These are not valid in UTF-8, although the > encoding/decoding of isolated surrogates is straightforward. So some care > has to be used here when specifying serialization/deserialization. > > <https://github.com/WebAssembly/spec/blob/master/test/core/names.wa > st#L1007> has scads of stuff outside BMP, e.g ˺˼𔗏⁾₎❩❫⟯﴿︶﹚)⦆ > ❳❵⟧⟩⟫⟭⦈⦊⦖⸣⸥︘︸︺︼︾﹀﹂﹄﹈﹜﹞]}」»’”›❯. (Can I claim kilo- > scads?) That isn't my point though. Unicode jargon is exceedingly exacting and I apologize in advance for not adding the necessary clarifiers. Supplementary characters (that is, those beyond the BMP) are not an issue. However, isolated (that is, *unpaired*) surrogate code units are permitted in JavaScript strings. The question is how to deal with them (not allowing them would be fine by me--for security they are often replaced by U+FFFD). So the question is whether you're permitted to have a string like "\uD800 ABCDEFG \uD800\uDC00\uD800" (which starts and ends with an unpaired surrogate, but has a valid surrogate pair in the middle). Addison
Received on Monday, 13 May 2019 23:51:30 UTC