Re: [XHR] responseType "json" from Glenn Maynard on 2012-01-07 (public-webapps@w3.org from January to March 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Fri, 6 Jan 2012 19:11:17 -0500
To: Jarred Nicholls <jarred@webkit.org>
Cc: "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>, Anne van Kesteren <annevk@opera.com>
Message-ID: <CABirCh__WJfp9Vne2dpvn=5c0GeDi9=ngMaRnsymVA_BKrTsTg@mail.gmail.com>
On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls <jarred@webkit.org> wrote:

> WebKit is used in many walled garden environments, so we consider these
> scenarios, but as a secondary goal to our primary goal of being a standards
> compliant browser engine.  The point being, there will always be content
> that's created solely for WebKit, so that's not a good argument to make.
>  So generally speaking, if someone is aiming to create content that's
> x-browser compatible, they'll do just that and use the least common
> denominators.
>

If you support UTF-16 here, then people will use it.  That's always the
pattern on the web--one browser implements something extra, and everyone
else ends up having to implement it--whether or not it was a good
idea--because people accidentally started depending on it.  I don't know
why we have to keep repeating this mistake.

We're not adding anything here, it's a matter of complicating and "taking
> away" from our decoder for one particular case.  You're acting like we're
> adding UTF-32 support for the first time.
>

Of course you are; you're adding UTF-16 and UTF-32 support to the
responseType == "json" API.

Also, since JSON uses zero-byte detection, which isn't used by HTML at all,
you'd still need code in your decoder to support that--which means you're
forcing everyone else to complicate *their* decoders with this special case.

XHR's behavior, if the change I suggested is accepted, shouldn't require
special cases in a decoding layer.  I'd have the decoder expose the final
encoding in use (which I'd expect to be available already), and when
.response is queried, return null if the final encoding used by the decoder
wasn't UTF-8.  This means the decoding would still take place for other
encodings, but the end result would be discarded by XHR.  This puts the
handling for this restriction within the XHR layer, rather than at the
decoder layer.

I said:

>  Also, I'm a bit confused.  You talk about the rudimentary encoding
>> detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
>> mechanisms (HTTP headers and overrideMimeType).  These are separate
>> and unrelated.  If you're using HTTP mechanisms, then the JSON spec
>> doesn't enter into it.  If you're using both HTTP headers (HTTP) and
>> UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
>> two.  I can't tell what mechanism you're actually using.
>
>
Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte
detection.  My question remains, though: what exactly are you doing?  Do
you do zero-byte detection?  Do you do BOM detection?  What's the order of
precedence between zero-byte and/or BOM detection, HTTP Content-Type
headers, and overrideMimeType if they disagree?  All of this would need to
be specified; currently none of it is.



> "without breaking existing content" and yet killing UTF-16 and UTF-32
> support just for responseType "json" would break existing UTF-16 and UTF-32
> JSON.  Well, which is it?
>

This is a new feature; there isn't yet existing content using a
responseType of "json" to be broken.

 Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding
> for the web platform.  But it's also plausible to push these restrictions
> not just in one spot in XHR, but across the web platform
>

I've yet to see a workable proposal to do this across the web platform, due
to backwards-compatibility.  That's why it's being done more narrowly,
where it can be done without breaking existing pages.  If you have any
novel ideas to do this across the platform, I guarantee everyone on the
list would like to hear them.  Failing that, we should do what we can where
we can.

and also where the web platform defers to external specs (e.g. JSON).  In
> this particular case, an author will be more likely to just use
> responseText + JSON.parse for content he/she cannot control - the content
> won't end up changing and our initiative is circumvented.
>

Of course not.  It tells the developer that something's wrong, and he has
the choice of working around it or fixing his service.  If just 25% of
those people make the right choice, this is a win.  It also helps
discourage new services from being written using legacy encodings.  We
can't stop people from doing the wrong thing, but that doesn't mean we
shouldn't point people in the right direction.

This is an editor's draft of a spec, it's not a recommendation, so it's
> hardly a violation of anything.
>

This is the worst thing I've seen anyone say in here in a long time.

On Fri, Jan 6, 2012 at 12:25 PM, Julian Reschke <julian.reschke@gmx.de>wrote:

> One could argue that it isn't a race "to the bottom" when the component
> accepts what is defined as valid (by the media type); and that the real
> problem is that another spec tries to profile that.
>

First off, it's common and perfectly normal for an API exposing features
from another spec to explicitly limit the allowed profile of that spec.
Saying "JSON through this API must be UTF-8" is perfectly OK.

Second, this isn't an issue of the JSON spec at all.  As described so far
(somewhat vaguely), his charset detection *isn't* what's described by
rfc4627, which only describes UTF-16 and UTF-32 zero-byte detection (and
that vaguely--it isn't even normative).  Rather, it's also mixing in bits
from HTTP (the Content-Type header, which I assume is what was meant by
"dictated by the server" in the original message) and XHR (the
overrideMimeType method).  None of that is defined by rfc4627, which makes
WebKit's behavior ad hoc, and none of this will be fixed by changes to
rfc4627 (which obviously shouldn't talk about HTTP headers).

-- 
Glenn Maynard
Received on Saturday, 7 January 2012 00:11:45 UTC