- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Sun, 19 Feb 2012 16:25:31 -0800
- To: Cameron McCormack <cam@mcc.id.au>
- Cc: Brendan Eich <brendan@mozilla.com>, "public-script-coord@w3.org" <public-script-coord@w3.org>, Anne van Kesteren <annevk@opera.com>, mranney@voxer.com, es-discuss <es-discuss@mozilla.org>
- Message-ID: <CAJ2xs_EiRVZmLJHTpx8yZ8R6sh85TqPmxqb9vAQzcvVCyZ5W8w@mail.gmail.com>
First, it would be great to get full Unicode support in JS. I know that's been a problem for us at Google. Secondly, while I agree with Addison that the approach that Java took is workable, it does cause problems. Ideally someone would be able to loop (a very common construct) with: for (codepoint cp : someString) { doSomethingWith(cp); } In Java, you have to do: int cp; for (int i = 0; i < someString.length(); i += Character.countChar(cp)) { cp = someString.codePointAt(i); doSomethingWith(cp); } There are good reasons for why Java did what it did, basically for compatibility. But if there is some way that JS can work around those, that'd be great. 3. There's some confusion about the Unicode terminology. Here's a quick clarification: code point: number from 0 to 0x10FFFF character: a code point that is assigned. Eg, 0x61 represents 'a' and is a character. 0x378 is a code point, but not (yet) a character. code unit: an encoding 'chunk'. UTF-8 represents a code point as 1-4 8-bit code units UTF-16 represents a code point as 2 or 4 16-bit code units UTF-32 represents a code point as 1 32-bit code unit. ------------------------------ Mark <https://plus.google.com/114199149796022210033> * * *— Il meglio è l’inimico del bene —* ** On Sun, Feb 19, 2012 at 16:00, Cameron McCormack <cam@mcc.id.au> wrote: > Brendan Eich: > > > To hope to make this sideshow beneficial to all the cc: list, what do > > DOM specs use to talk about uint16 units vs. code points? > > I say "code unit" as a shorter way of saying "16 bit unsigned integer code > unit" > > http://dev.w3.org/2006/webapi/**WebIDL/#dfn-code-unit<http://dev.w3.org/2006/webapi/WebIDL/#dfn-code-unit> > > (which DOM4 also links to) and then just "code point" to refer to 21 bit > numbers that might correspond to a Unicode character, which you can see > used in > > http://dev.w3.org/2006/webapi/**WebIDL/#dfn-obtain-unicode<http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode> > > ______________________________**_________________ > es-discuss mailing list > es-discuss@mozilla.org > https://mail.mozilla.org/**listinfo/es-discuss<https://mail.mozilla.org/listinfo/es-discuss> >
Received on Monday, 20 February 2012 00:26:00 UTC