- From: Randall Leeds <randall@bleeds.info>
- Date: Thu, 12 Nov 2015 07:10:16 +0000
- To: Takeshi Kanai via GitHub <sysbot+gh@w3.org>, public-annotation@w3.org
- Message-ID: <CAAL6JQi7J68wfqxPKHw381Fz4nPrBSU2FXA=eshScdOYkk-Rqg@mail.gmail.com>
You should be able to do Array.from(str).length to get the number of symbols, I think. Not sure whether .normalize() called on the string first would make a difference in the result. On Wed, Nov 11, 2015, 23:06 Takeshi Kanai via GitHub <sysbot+gh@w3.org> wrote: > Here are the test results of newly introduced String functions in ES6. > > `var yoshinoya = "𠮷野屋";` > The string consists of three letters. The first letter is in Unicode > BMP. It means it is not possible to describe within 16bits. > > `var identical = yoshinoya === String.fromCodePoint(0x20BB7, 0x91ce, > 0x5c4b) ? "yes" : "no"; /// yes` > `var identical = yoshinoya === String.fromCharCode(0xd842, 0xdfb7, > 0x91ce, 0x5c4b) ? "yes" : "no"; /// yes` > I guess fromCodePoint() is a function which splits each arg (> > 0x10ffff) in two, and throw the args to fromCharCode(). Then it > generates code unit basis String object regardless where it is from. > > `yoshinoya.codePointAt(0).toString(16); /// 20bb7` > `yoshinoya.charCodeAt(0).toString(16); /// d842` > Looks good. > > `yoshinoya.codePointAt(1).toString(16); /// dfb7 !!!` > `yoshinoya.charCodeAt(1).toString(16); /// dfb7` > > Not good. I was expecting code-point basis indexing for codePointAt(). > It appears to me it is still on code-unit basis indexing. > > Regarding Editing distance, I think codePointAt() would work for it, > but it calls for a custom indexing which shifts index in case an > obtained code is in specific ranges, such as codes in Low Surrogate. > > > -- > GitHub Notif of comment by tkanai > See https://github.com/w3c/findtext/issues/4#issuecomment-156019050 > >
Received on Thursday, 12 November 2015 07:10:55 UTC