> If you happen to want to interpret them as UTF-16, you are free to do so, but there is not and never will be any guarantee that all strings are well-formed UTF-16. You never have that guarantee, any more than you have the guarantee that a source purporting to be UTF-8 is in fact well formed. All conscientious recipients need to check the data -- *if* they are sensitive to ill-formed text. Luckily, the impact of ill-formed UTF-16 is vastly less than that of ill-formed UTF-8. Mark On Fri, Oct 30, 2009 at 17:47, John Cowan <cowan@ccil.org> wrote: > Phillips, Addison scripsit: > > > ECMAScript's "firm commitment" to a 16-bit character model (i.e. UTF-16) > > If only. > > JavaScript and JSON strings aren't sequences of characters, they are > sequences of 16-bit unsigned integers. If you happen to want to interpret > them as UTF-16, you are free to do so, but there is not and never will > be any guarantee that all strings are well-formed UTF-16. What's more, > the built-in JSON serializer provided by ECMAScript 5th edition does > not generate escape sequences for isolated surrogate codepoints, so that > some strings will be written out in CESU-8 rather than UTF-8. > > Worse yet, the JSON RFC is self-contradictory, with the result that it's > not even clear that CESU-8-encoded JSON is illegal. > > -- > Let's face it: software is crap. Feature-laden and bloated, written under > tremendous time-pressure, often by incapable coders, using dangerous > languages and inadequate tools, trying to connect to heaps of broken or > obsolete protocols, implemented equally insufficiently, running on > unpredictable hardware -- we are all more than used to brokenness. > --Felix Winkelmann >Received on Saturday, 31 October 2009 03:39:46 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 31 October 2009 03:40:00 GMT