Re: Changes to DOM3 Events Key Identifiers from John Cowan on 2009-10-31 (www-dom@w3.org from October to December 2009)

From: John Cowan <cowan@ccil.org>
Date: Fri, 30 Oct 2009 20:47:38 -0400
To: "Phillips, Addison" <addison@amazon.com>
Cc: John Cowan <cowan@ccil.org>, Doug Schepers <schepers@w3.org>, Mark Davis �?? <mark@macchiato.com>, "www-dom@w3.org" <www-dom@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <20091031004738.GF19026@mercury.ccil.org>

Phillips, Addison scripsit:

> ECMAScript's "firm commitment" to a 16-bit character model (i.e. UTF-16)

If only.

JavaScript and JSON strings aren't sequences of characters, they are
sequences of 16-bit unsigned integers.  If you happen to want to interpret
them as UTF-16, you are free to do so, but there is not and never will
be any guarantee that all strings are well-formed UTF-16.  What's more,
the built-in JSON serializer provided by ECMAScript 5th edition does
not generate escape sequences for isolated surrogate codepoints, so that
some strings will be written out in CESU-8 rather than UTF-8.

Worse yet, the JSON RFC is self-contradictory, with the result that it's
not even clear that CESU-8-encoded JSON is illegal.

-- 
Let's face it: software is crap. Feature-laden and bloated, written under
tremendous time-pressure, often by incapable coders, using dangerous
languages and inadequate tools, trying to connect to heaps of broken or
obsolete protocols, implemented equally insufficiently, running on
unpredictable hardware -- we are all more than used to brokenness.
                   --Felix Winkelmann

Received on Saturday, 31 October 2009 00:48:12 UTC