W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

Re: plane-1 support in (java|ecma)script

From: Mark Davis <mark.davis@icu-project.org>
Date: Wed, 1 Nov 2006 20:55:45 -0800
Message-ID: <30b660a20611012055x765fc9fby6ae52e9b4a02bf50@mail.gmail.com>
To: www-international@w3.org
Cc: "Markus Scherer" <markus.icu@gmail.com>
Java strings work ok with 16-bit limitations, supplemented by certain
utilities. Those are easy to supply for Javascript also. The main limitation
in JS is that all of the testing for character properties is via regular
expressions, and those don't support supplementaries (and it is never clear
which version of Unicode they support anyway).

Mark

On 11/1/06, L. David Baron <dbaron@dbaron.org> wrote:
>
> On Wednesday 2006-11-01 12:03 -0500, Hugh Cayless wrote:
> > Does anyone know what the status is of support for plane-1 unicode
> > characters in ECMAScript?  There seems to be no concept of characters
> > greater than \uxxxx in any of the implementations I've tried
> > (Firefox, Safari, IE) and nothing in the ECMAScript 3 spec.  The
> > email address of the maintainer for the Javascript 2.0 proposal on
> > Mozilla's pages bounced, so I'm not sure where to go next.  I'd like
> > to know if there's any likelihood of support for characters in the
> > range \uxxxxxx anytime in the near future.
>
> The ECMA spec (ECMA-262 edition 3 [1]) defines strings as 16-bit
> units, normally expected to contain UTF-16 text (4.3.16), but then
> refers to these 16-bit units as characters (7.8.4, 15.5), which
> makes things a little ambiguous.  However, I think the intent of the
> spec, and the way it's generally implemented, is that string
> operations like String.prototype.substring, String.prototype.charAt,
> etc., all operate using indices into the 16-bit UTF-16 units.
>
> You can probably get non-BMP characters into JavaScript strings by
> using the appropriate high and low surrogates used in UTF-16
> encoding.  If your goal is to eventually have the string end up in
> an HTML document, it's likely to work.  If you want to do string
> operations on the string in JavaScript and expect your character not
> to be split in half, it might not be so great.
>
> For what it's worth, there is ongoing work [2] on ECMA-262 edition
> 4.  But I don't know if there's any work on changing the 16-bitness
> of strings.
>
> -David
>
> [1] http://www.ecma-international.org/publications/standards/Ecma-262.htm
> [2] http://lambda-the-ultimate.org/node/1543
>
> --
> L. David Baron                                <URL: http://dbaron.org/ >
>            Technical Lead, Layout & CSS, Mozilla Corporation
>
>
>
Received on Thursday, 2 November 2006 04:58:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:08 GMT