Re: plane-1 support in (java|ecma)script

On Wednesday 2006-11-01 12:03 -0500, Hugh Cayless wrote:
> Does anyone know what the status is of support for plane-1 unicode  
> characters in ECMAScript?  There seems to be no concept of characters  
> greater than \uxxxx in any of the implementations I've tried  
> (Firefox, Safari, IE) and nothing in the ECMAScript 3 spec.  The  
> email address of the maintainer for the Javascript 2.0 proposal on  
> Mozilla's pages bounced, so I'm not sure where to go next.  I'd like  
> to know if there's any likelihood of support for characters in the  
> range \uxxxxxx anytime in the near future.

The ECMA spec (ECMA-262 edition 3 [1]) defines strings as 16-bit
units, normally expected to contain UTF-16 text (4.3.16), but then
refers to these 16-bit units as characters (7.8.4, 15.5), which
makes things a little ambiguous.  However, I think the intent of the
spec, and the way it's generally implemented, is that string
operations like String.prototype.substring, String.prototype.charAt,
etc., all operate using indices into the 16-bit UTF-16 units.

You can probably get non-BMP characters into JavaScript strings by
using the appropriate high and low surrogates used in UTF-16
encoding.  If your goal is to eventually have the string end up in
an HTML document, it's likely to work.  If you want to do string
operations on the string in JavaScript and expect your character not
to be split in half, it might not be so great.

For what it's worth, there is ongoing work [2] on ECMA-262 edition
4.  But I don't know if there's any work on changing the 16-bitness
of strings.

-David

[1] http://www.ecma-international.org/publications/standards/Ecma-262.htm
[2] http://lambda-the-ultimate.org/node/1543

-- 
L. David Baron                                <URL: http://dbaron.org/ >
           Technical Lead, Layout & CSS, Mozilla Corporation

Received on Thursday, 2 November 2006 04:05:45 UTC