Re: [XHR] responseType "json" from Jarred Nicholls on 2012-01-06 (public-webapps@w3.org from January to March 2012)

From: Jarred Nicholls <jarred@webkit.org>
Date: Fri, 6 Jan 2012 10:00:40 -0500
To: "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>
Message-ID: <CANufG2Njc7YNM_1yJcdg2O8Hn_E4wGe-Tp0xEtJ7d+OvWi3gbg@mail.gmail.com>

On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams <glenn@skynav.com> wrote:

> But, if the browser does not support UTF-32, then the table in step (4) of
> > [1] is supposed to apply, which would interpret the initial two bytes FF
> FE
> > as UTF-16LE according to the current language of [1], and further,
> return a
> > confidence level of "certain".
> >
> > I see the problem now. It seems that the table in step (4) should be
> > changed to interpret an initial FF FE as UTF-16BE only if the following
> two
> > bytes are not 00.
> >

> That wouldn't actually bring browsers and the spec closer together; it
> would actually bring them further apart.

> At first glance, it looks like it makes the spec allow WebKit and IE's
> behavior, which (unfortunately) includes UTF-32 detection, by allowing them
> to fall through to step 7, where they're allowed to detect things however
> they want.

> However, that's ignoring step 5.  If step 4 passes through, then step 5
> would happen next.  That means this carefully-constructed file would be
> detected as UTF-8 by step 5:

> http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding

> That's not what happens in any browser; FF detects it as UTF-16 and WebKit
> and IE detect it as UTF-32.  This change would require it to be detected as
> UTF-8, which would have security implications if implemented, eg. a page
> outputting escaped user-inputted text in UTF-32 might contain a string like
> this, followed by a hostile <script>, when interpreted as UTF-8.

> This really isn't worth spending time on; you've free to press this if you
> like, but I'm moving on.

> --
> Glenn Maynard

I'm getting responseType "json" landed in WebKit, and going to do so
without the restriction of the JSON source being UTF-8.  We default our
decoding to UTF-8 if none is dictated by the server or overrideMIMEType(),
but we also do BOM detection and will gracefully switch to UTF-16(BE/LE) or
UTF-32(BE/LE) if the context is encoded as such, and accept the source
as-is.

It's a matter of having that perfect recipe of "easiest implementation +
most interoperability".  It actually adds complication to our decoder if we
do something special just for (perfectly legit) JSON payloads.  I think
keeping that UTF-8 bit in the spec is fine, but I don't think WebKit will
be reducing our interoperability and complicating our code base.  If we
don't want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the
JSON grammar and JSON.parse will do the leg work.  As someone else stated,
this is a good fight but probably not the right battlefield.

Received on Friday, 6 January 2012 15:01:38 UTC