Re: [Json] BOMs from Allen Wirfs-Brock on 2013-11-21 (www-tag@w3.org from November 2013)

From: Allen Wirfs-Brock <allen@wirfs-brock.com>
Date: Thu, 21 Nov 2013 09:01:01 -0800
To: Henri Sivonen <hsivonen@hsivonen.fi>
Cc: John Cowan <cowan@mercury.ccil.org>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, Pete Cordell <petejson@codalogic.com>, JSON WG <json@ietf.org>, www-tag <www-tag@w3.org>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, es-discuss <es-discuss@mozilla.org>, Anne van Kesteren <annevk@annevk.nl>, "t.p." <daedulus@btconnect.com>, IETF Discussion <ietf@ietf.org>
Message-Id: <50CFBDEE-53A5-4159-93C4-348CF31EC8EF@wirfs-brock.com>

On Nov 21, 2013, at 5:28 AM, Henri Sivonen wrote:

> On Thu, Nov 21, 2013 at 7:53 AM, Allen Wirfs-Brock
> <allen@wirfs-brock.com> wrote:
>> Just to be clear about this.  My tests directly tested JavaScript built-in
>> JSON parsers WRT to BOM support in three major browsers.  The tests directly
>> invoked the built-in JSON.parse functions and directly passed to them a
>> source strings that was explicitly constructed to contain a BOM code point .
>> This was done to ensure that the all transport layers  (and any transcodings
>> they might perform) were bypassed and that we were actually testing the real
>> built-in JSON parse functions.
> 
> It would be surprising if JSON.parse() accepted a BOM, since it
> doesn't take bytes as input.

ECMAScript's JSON.parse accepts an ECMAScript string value as its input.  ECMAScript strings are sequences of 16-bit values.  JSON.parse (and most other ECMAScript functions) interpret those values  as Unicode code units.  The value U+FEFF can appear at any position within a string. When defining a string as an ECMAScript literal, a sequence like \ufeff is an escape sequence that means place the code unit value 0xefff into the string at this position in the sequence. Also note that the actual strings passed below to JSON.parse contain the actual code point value U+FEFF not the escape sequence that was used to express it.  To include the actual escape sequence characters in the string it would have to be expressed as '\\feff'.

JSON.parse('\ufeff ["XYZ"]');  //note outer quotes delimit an ECMAScript string, the inner quotes are a JSON string.  

throws a runtime SyntaxError exception because the JSON grammar does not allow U+FEFF to appear that position

JSON.parse('["\ufeffXYZ"]');

operates without error and returns a Array containing a four element ECMAScript string.   This works because the JSON grammar allows any code unit except for " and \ and the ASCII control characters to appear literally in a JSON string. 

> 
> However, XHR's responseType = "json" exercises browsers in a way where
> the input is bytes from the network. From the perspective of JSON
> support in XHR,
> http://lists.w3.org/Archives/Public/www-tag/2013Nov/0149.html (which
> didn't reach the es-discuss part of this thread previously) applies.

Right, JSON use via XHR is a different usage scenario and that probably involves encoding and decoding steps. It has very little to do with the JSON syntax, as defined in ECMA-404. It's all about how the bits that represent a string are interchanged, not the eventual semantic processing of the string (ie, processing by JSON.parse or some other JSON parser)

Allen

Received on Thursday, 21 November 2013 17:01:46 UTC