- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Tue, 20 Aug 2013 17:21:35 +0300
- To: whatwg@lists.whatwg.org
2013-08-20 17:09, Anne van Kesteren wrote: > On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa <rniwa@apple.com> wrote: >> Can the specification be changed to use the number of composed character sequences instead of the code-unit length? > > In a way I guess that's nice, but it also seems confusing that given > > data:text/html,<input type=text maxlength=1> > > pasting in U+0041 U+030A would give a string that's longer than 1 from > JavaScript's perspective. Oh, right, this is an issue different from the non-BMP issue I discussed in my reply. This is even clearer in my opinion, since U+0041 U+030A is clearly two Unicode characters, not one, even though it is expected to be rendered as “Å” and even though U+00C5 is canonically equivalent to U+0041 U+030A. > I don't think there's any place in the > platform where we measure string length other than by number of code > units at the moment. Besides, if “character” means something else than Unicode character (Unicode code point assigned to a character) or, as a different concept, Unicode code unit, then the question would arise what it means. For example, would a letter followed by 42 combining marks still be one character? (Such monstrosities are actually used, in an attempt to create “funny” effects.) Yucca
Received on Tuesday, 20 August 2013 14:22:01 UTC