- From: Takeshi Kanai via GitHub <sysbot+gh@w3.org>
- Date: Thu, 12 Nov 2015 07:01:17 +0000
- To: public-annotation@w3.org
Here are the test results of newly introduced String functions in ES6. `var yoshinoya = "𠮷野屋";` The string consists of three letters. The first letter is in Unicode BMP. It means it is not possible to describe within 16bits. `var identical = yoshinoya === String.fromCodePoint(0x20BB7, 0x91ce, 0x5c4b) ? "yes" : "no"; /// yes` `var identical = yoshinoya === String.fromCharCode(0xd842, 0xdfb7, 0x91ce, 0x5c4b) ? "yes" : "no"; /// yes` I guess fromCodePoint() is a function which splits each arg (> 0x10ffff) in two, and throw the args to fromCharCode(). Then it generates code unit basis String object regardless where it is from. `yoshinoya.codePointAt(0).toString(16); /// 20bb7` `yoshinoya.charCodeAt(0).toString(16); /// d842` Looks good. `yoshinoya.codePointAt(1).toString(16); /// dfb7 !!!` `yoshinoya.charCodeAt(1).toString(16); /// dfb7` Not good. I was expecting code-point basis indexing for codePointAt(). It appears to me it is still on code-unit basis indexing. Regarding Editing distance, I think codePointAt() would work for it, but it calls for a custom indexing which shifts index in case an obtained code is in specific ranges, such as codes in Low Surrogate. -- GitHub Notif of comment by tkanai See https://github.com/w3c/findtext/issues/4#issuecomment-156019050
Received on Thursday, 12 November 2015 07:01:20 UTC