- From: Takeshi Kanai <notifications@github.com>
- Date: Wed, 11 Nov 2015 23:01:18 -0800
- To: w3c/findtext <findtext@noreply.github.com>
- Message-ID: <w3c/findtext/issues/4/156019050@github.com>
Here are the test results of newly introduced String functions in ES6. `var yoshinoya = "𠮷野屋";` The string consists of three letters. The first letter is in Unicode BMP. It means it is not possible to describe within 16bits. `var identical = yoshinoya === String.fromCodePoint(0x20BB7, 0x91ce, 0x5c4b) ? "yes" : "no"; /// yes` `var identical = yoshinoya === String.fromCharCode(0xd842, 0xdfb7, 0x91ce, 0x5c4b) ? "yes" : "no"; /// yes` I guess fromCodePoint() is a function which splits each arg (> 0x10ffff) in two, and throw the args to fromCharCode(). Then it generates code unit basis String object regardless where it is from. `yoshinoya.codePointAt(0).toString(16); /// 20bb7` `yoshinoya.charCodeAt(0).toString(16); /// d842` Looks good. `yoshinoya.codePointAt(1).toString(16); /// dfb7 !!!` `yoshinoya.charCodeAt(1).toString(16); /// dfb7` Not good. I was expecting code-point basis indexing for codePointAt(). It appears to me it is still on code-unit basis indexing. Regarding Editing distance, I think codePointAt() would work for it, but it calls for a custom indexing which shifts index in case an obtained code is in specific ranges, such as codes in Low Surrogate. --- Reply to this email directly or view it on GitHub: https://github.com/w3c/findtext/issues/4#issuecomment-156019050
Received on Thursday, 12 November 2015 07:02:27 UTC