Re: [XHR] responseType "json" from Jarred Nicholls on 2012-01-07 (public-webapps@w3.org from January to March 2012)

From: Jarred Nicholls <jarred@webkit.org>
Date: Fri, 6 Jan 2012 19:36:34 -0500
To: Glenn Maynard <glenn@zewt.org>
Cc: Jarred Nicholls <jarred@webkit.org>, "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>, Anne van Kesteren <annevk@opera.com>
Message-Id: <AFFD859F-5AFE-42C1-9E43-FB5BF72B4C78@webkit.org>
Sent from my iPhone

On Jan 6, 2012, at 7:11 PM, Glenn Maynard <glenn@zewt.org> wrote:

> On Fri, Jan 6, 2012 at 12:13 PM, Jarred Nicholls <jarred@webkit.org> wrote:
> WebKit is used in many walled garden environments, so we consider these scenarios, but as a secondary goal to our primary goal of being a standards compliant browser engine.  The point being, there will always be content that's created solely for WebKit, so that's not a good argument to make.  So generally speaking, if someone is aiming to create content that's x-browser compatible, they'll do just that and use the least common denominators.
> 
> If you support UTF-16 here, then people will use it.  That's always the pattern on the web--one browser implements something extra, and everyone else ends up having to implement it--whether or not it was a good idea--because people accidentally started depending on it.  I don't know why we have to keep repeating this mistake.
> 
> We're not adding anything here, it's a matter of complicating and "taking away" from our decoder for one particular case.  You're acting like we're adding UTF-32 support for the first time.
> 
> Of course you are; you're adding UTF-16 and UTF-32 support to the responseType == "json" API.
> 
> Also, since JSON uses zero-byte detection, which isn't used by HTML at all, you'd still need code in your decoder to support that--which means you're forcing everyone else to complicate *their* decoders with this special case.
> 
> XHR's behavior, if the change I suggested is accepted, shouldn't require special cases in a decoding layer.  I'd have the decoder expose the final encoding in use (which I'd expect to be available already), and when .response is queried, return null if the final encoding used by the decoder wasn't UTF-8.  This means the decoding would still take place for other encodings, but the end result would be discarded by XHR.  This puts the handling for this restriction within the XHR layer, rather than at the decoder layer.

That's why I'd like to see the spec changed to clarify the discarding if the encoding was supplied and isn't UTF-8.

> 
> I said:
> Also, I'm a bit confused.  You talk about the rudimentary encoding
> detection in the JSON spec (rfc4627 sec3), but you also mention HTTP
> mechanisms (HTTP headers and overrideMimeType).  These are separate
> and unrelated.  If you're using HTTP mechanisms, then the JSON spec
> doesn't enter into it.  If you're using both HTTP headers (HTTP) and
> UTF-32 BOM detection (rfc4627), then you're using a strange mix of the
> two.  I can't tell what mechanism you're actually using.
> 
> Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection.  My question remains, though: what exactly are you doing?  Do you do zero-byte detection?  Do you do BOM detection?  What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree?  All of this would need to be specified; currently none of it is.

None of that matters if a specific codec is the one all be all.  If that's the consensus then that's it, period.

WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse.  Read the code if you're interested.  I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes.  I don't want to get hung on explaining WebKit's specific impl. details.

> 
>  
> "without breaking existing content" and yet killing UTF-16 and UTF-32 support just for responseType "json" would break existing UTF-16 and UTF-32 JSON.  Well, which is it?
> 
> This is a new feature; there isn't yet existing content using a responseType of "json" to be broken.
> 
> Don't get me wrong, I agree with pushing UTF-8 as the sole text encoding for the web platform.  But it's also plausible to push these restrictions not just in one spot in XHR, but across the web platform
> 
> I've yet to see a workable proposal to do this across the web platform, due to backwards-compatibility.  That's why it's being done more narrowly, where it can be done without breaking existing pages.  If you have any novel ideas to do this across the platform, I guarantee everyone on the list would like to hear them.  Failing that, we should do what we can where we can.
> 
> and also where the web platform defers to external specs (e.g. JSON).  In this particular case, an author will be more likely to just use responseText + JSON.parse for content he/she cannot control - the content won't end up changing and our initiative is circumvented.
> 
> Of course not.  It tells the developer that something's wrong, and he has the choice of working around it or fixing his service.  If just 25% of those people make the right choice, this is a win.  It also helps discourage new services from being written using legacy encodings.  We can't stop people from doing the wrong thing, but that doesn't mean we shouldn't point people in the right direction.
> 
> This is an editor's draft of a spec, it's not a recommendation, so it's hardly a violation of anything.
> 
> This is the worst thing I've seen anyone say in here in a long time.

Wtaf, why is everyone taking this point and driving it so out of context? I was trying to make a point that things change overnight...I've already explained and I won't do it again.  Relax already, it's Friday!

> 
> On Fri, Jan 6, 2012 at 12:25 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> One could argue that it isn't a race "to the bottom" when the component  accepts what is defined as valid (by the media type); and that the real problem is that another spec tries to profile that.
> 
> First off, it's common and perfectly normal for an API exposing features from another spec to explicitly limit the allowed profile of that spec.  Saying "JSON through this API must be UTF-8" is perfectly OK.
> 
> Second, this isn't an issue of the JSON spec at all.  As described so far (somewhat vaguely), his charset detection *isn't* what's described by rfc4627, which only describes UTF-16 and UTF-32 zero-byte detection (and that vaguely--it isn't even normative).  Rather, it's also mixing in bits from HTTP (the Content-Type header, which I assume is what was meant by "dictated by the server" in the original message) and XHR (the overrideMimeType method).  None of that is defined by rfc4627, which makes WebKit's behavior ad hoc, and none of this will be fixed by changes to rfc4627 (which obviously shouldn't talk about HTTP headers).
> 
> -- 
> Glenn Maynard
>
Received on Saturday, 7 January 2012 00:37:11 UTC