Re: [XHR] responseType "json" from Jarred Nicholls on 2012-01-06 (public-webapps@w3.org from January to March 2012)

From: Jarred Nicholls <jarred@webkit.org>
Date: Fri, 6 Jan 2012 12:13:16 -0500
To: Glenn Maynard <glenn@zewt.org>
Cc: "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>, Anne van Kesteren <annevk@opera.com>
Message-ID: <CANufG2PFSU3Pof1Wvfw_LbFHAFYZoNkbMgW3hC8Nj1DAzn+dkA@mail.gmail.com>
On Fri, Jan 6, 2012 at 11:20 AM, Glenn Maynard <glenn@zewt.org> wrote:

> Please be careful with quote markers; you quoted text written by me as
> written by Glenn Adams.
>

Sorry, copying from the archives into Gmail is a pain.


>
> On Fri, Jan 6, 2012 at 10:00 AM, Jarred Nicholls <jarred@webkit.org>
> wrote:
> > I'm getting responseType "json" landed in WebKit, and going to do so
> without
> > the restriction of the JSON source being UTF-8.  We default our decoding
> to
> > UTF-8 if none is dictated by the server or overrideMIMEType(), but we
> also
> > do BOM detection and will gracefully switch to UTF-16(BE/LE) or
> > UTF-32(BE/LE) if the context is encoded as such, and accept the source
> > as-is.
> >
> > It's a matter of having that perfect recipe of "easiest implementation +
> > most interoperability".  It actually adds complication to our decoder if
> we
>
> Accepting content that other browsers don't will result in pages being
> created that work only in WebKit.


WebKit is used in many walled garden environments, so we consider these
scenarios, but as a secondary goal to our primary goal of being a standards
compliant browser engine.  The point being, there will always be content
that's created solely for WebKit, so that's not a good argument to make.
 So generally speaking, if someone is aiming to create content that's
x-browser compatible, they'll do just that and use the least common
denominators.


>  That gives the least
> interoperability, not the most.


> If this behavior gets propagated into other browsers, that's even
> worse.  Gecko doesn't support UTF-32, and adding it would be a huge
> step backwards.
>

We're not adding anything here, it's a matter of complicating and "taking
away" from our decoder for one particular case.  You're acting like we're
adding UTF-32 support for the first time.


>
> > do something special just for (perfectly legit) JSON payloads.  I think
> > keeping that UTF-8 bit in the spec is fine, but I don't think WebKit
> will be
> > reducing our interoperability and complicating our code base.  If we
> don't
> > want JSON to be UTF-16 or UTF-32, let's change the JSON spec and the JSON
> > grammar and JSON.parse will do the leg work.
>
> Big -1 to perpetuating UTF-16 and UTF-32 due to braindamage in an IETF
> spec.
>

So let's change the IETF spec as well - are we even fighting that battle
yet?


>
> Also, I'm a bit confused.  You talk about the rudimentary encoding
> detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
> mechanisms (HTTP headers and overrideMimeType).  These are separate
> and unrelated.  If you're using HTTP mechanisms, then the JSON spec
> doesn't enter into it.  If you're using both HTTP headers (HTTP) and
> UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
> two.  I can't tell what mechanism you're actually using.






> > As someone else stated, this is a good fight but probably not the right
> battlefield.
>
> Strongly disagree.  Preventing legacy messes from being perpetuated
> into new APIs is one of the *only* battlefields available, where we
> can get people to stop using legacy encodings without breaking
> existing content.
>

"without breaking existing content" and yet killing UTF-16 and UTF-32
support just for responseType "json" would break existing UTF-16 and UTF-32
JSON.  Well, which is it?

Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding
for the web platform.  But it's also plausible to push these restrictions
not just in one spot in XHR, but across the web platform and also where the
web platform defers to external specs (e.g. JSON).  In this particular
case, an author will be more likely to just use responseText + JSON.parse
for content he/she cannot control - the content won't end up changing and
our initiative is circumvented.

I suggest taking this initiative elsewhere (at least in parallel), i.e.,
getting RFC4627 to only support UTF-8 encoding if that's the larger
picture.  To say that a legit JSON source can be stored as any Unicode
encoding but can only be transported as UTF-8 in this one particular XHR
case is inconsistent and only leads to worse interoperability and confusion
to those looking up these specs - if I go to JSON spec first, I'll see all
those encodings are supported and wonder why it doesn't work in this one
instance.  Are we out to totally confuse the hell out of authors?


>
> Anne: There's one related change I'd suggest.  Currently, if a JSON
> response says "Content-Encoding: application/json; charset=Shift_JIS",
> the explicit charset will be silently ignored and UTF-8 will be used.
> I think this should be explicitly rejected, returning null as the JSON
> response entity body.  Don't decode as UTF-8 despite an explicitly
> conflicting header, or people will start sending bogus charset values
> without realizing it.
>

+1


> --
> Glenn Maynard
Received on Friday, 6 January 2012 19:49:07 UTC