- From: L. David Baron <dbaron@dbaron.org>
- Date: Wed, 1 Nov 2006 20:05:27 -0800
- To: www-international@w3.org
- Message-ID: <20061102040527.GA18048@ridley.dbaron.org>
On Wednesday 2006-11-01 12:03 -0500, Hugh Cayless wrote: > Does anyone know what the status is of support for plane-1 unicode > characters in ECMAScript? There seems to be no concept of characters > greater than \uxxxx in any of the implementations I've tried > (Firefox, Safari, IE) and nothing in the ECMAScript 3 spec. The > email address of the maintainer for the Javascript 2.0 proposal on > Mozilla's pages bounced, so I'm not sure where to go next. I'd like > to know if there's any likelihood of support for characters in the > range \uxxxxxx anytime in the near future. The ECMA spec (ECMA-262 edition 3 [1]) defines strings as 16-bit units, normally expected to contain UTF-16 text (4.3.16), but then refers to these 16-bit units as characters (7.8.4, 15.5), which makes things a little ambiguous. However, I think the intent of the spec, and the way it's generally implemented, is that string operations like String.prototype.substring, String.prototype.charAt, etc., all operate using indices into the 16-bit UTF-16 units. You can probably get non-BMP characters into JavaScript strings by using the appropriate high and low surrogates used in UTF-16 encoding. If your goal is to eventually have the string end up in an HTML document, it's likely to work. If you want to do string operations on the string in JavaScript and expect your character not to be split in half, it might not be so great. For what it's worth, there is ongoing work [2] on ECMA-262 edition 4. But I don't know if there's any work on changing the 16-bitness of strings. -David [1] http://www.ecma-international.org/publications/standards/Ecma-262.htm [2] http://lambda-the-ultimate.org/node/1543 -- L. David Baron <URL: http://dbaron.org/ > Technical Lead, Layout & CSS, Mozilla Corporation
Received on Thursday, 2 November 2006 04:05:45 UTC