- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 21 Nov 2013 18:41:08 +0100
- To: Allen Wirfs-Brock <allen@wirfs-brock.com>
- Cc: JSON WG <json@ietf.org>, www-tag <www-tag@w3.org>, es-discuss <es-discuss@mozilla.org>, IETF Discussion <ietf@ietf.org>, Henri Sivonen <hsivonen@hsivonen.fi>
* Allen Wirfs-Brock wrote: >On Nov 21, 2013, at 5:28 AM, Henri Sivonen wrote: >> On Thu, Nov 21, 2013 at 7:53 AM, Allen Wirfs-Brock >> <allen@wirfs-brock.com> wrote: >>> Just to be clear about this. My tests directly tested JavaScript built-in >>> JSON parsers WRT to BOM support in three major browsers. The tests directly >>> invoked the built-in JSON.parse functions and directly passed to them a >>> source strings that was explicitly constructed to contain a BOM code point . >> It would be surprising if JSON.parse() accepted a BOM, since it >> doesn't take bytes as input. > >ECMAScript's JSON.parse accepts an ECMAScript string value as its input. >ECMAScript strings are sequences of 16-bit values. JSON.parse (and most >other ECMAScript functions) interpret those values as Unicode code >units. The value U+FEFF can appear at any position within a string. >When defining a string as an ECMAScript literal, a sequence like \ufeff >is an escape sequence that means place the code unit value 0xefff into >the string at this position in the sequence. Also note that the actual >strings passed below to JSON.parse contain the actual code point value >U+FEFF not the escape sequence that was used to express it. To include >the actual escape sequence characters in the string it would have to be >expressed as '\\feff'. A byte order mark indicates the order of bytes in a sequence of bytes. An ecmascript string is not a sequence of bytes and therefore it cannot have a byte order mark inside it. Your test is not for BOM support but for an egregious semantic error in the implementation of JSON.parse. http://shadowregistry.org/js/misc/#t2ea25a961255bb1202da9497a1942e09 That is a similar test. It makes Firefox see UTF-8 BOMs in ecmascript strings. Firefox is not supposed to look for UTF-8 BOMs in ecmascript strings because ecmascript strings are not sequences of bytes at that level of reasoning. Is there any chance, by the way, to change `JSON.stringify` so it does not output strings that cannot be encoded using UTF-8? Specifically, JSON.stringify(JSON.parse("\"\uD800\"")) would need to escape the surrogate instead of emitting it literally. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 21 November 2013 17:41:43 UTC