Re: [svgwg] Consider adding a switch to control how Addresssable Characters are treated from Amelia Bellamy-Royds via GitHub on 2016-09-16 (public-svg-issues@w3.org from September 2016)

From: Amelia Bellamy-Royds via GitHub <sysbot+gh@w3.org>
Date: Fri, 16 Sep 2016 21:24:09 +0000
To: public-svg-issues@w3.org
Message-ID: <issue_comment.created-247712688-1474061048-sysbot+gh@w3.org>

I agree with @fsoder.  Switching to UTF-8 blocks isn't an improvement 
over UTF-16 blocks.  It just means that even more logical characters 
would get split over multiple "addressable characters".

We want to be able to count actual Unicode code points, regardless of 
how many bytes it takes to represent them.  This is consistent with 
the new EcmaScript 6 
[`codePointAt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/codePointAt)
 and 
[`fromCodePoint`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCodePoint)
 String methods.  (These replace the `charCodeAt` and `fromCharCode` 
methods which relied exclusively on UTF-16 surrogate pairs.)  
Unfortunately, the EcmaScript Unicode conversion doesn't seem to yet 
support any easy way of finding the number of distinct code points in 
a string or substring.  Even the position argument used by 
`codePointAt` still seems to be based on the UTF-16 indexing.

-- 
GitHub Notification of comment by AmeliaBR
Please view or discuss this issue at 
https://github.com/w3c/svgwg/issues/280#issuecomment-247712688 using 
your GitHub account

Received on Friday, 16 September 2016 21:24:24 UTC