Re: [XHR] responseType "json" from Jarred Nicholls on 2012-01-07 (public-webapps@w3.org from January to March 2012)

From: Jarred Nicholls <jarred@webkit.org>
Date: Fri, 6 Jan 2012 20:55:15 -0500
To: Glenn Maynard <glenn@zewt.org>
Cc: Jarred Nicholls <jarred@webkit.org>, "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>, Anne van Kesteren <annevk@opera.com>
Message-Id: <63270DA5-8EE6-42C6-9A8C-DA3AEC5F2DD6@webkit.org>

On Jan 6, 2012, at 8:10 PM, Glenn Maynard <glenn@zewt.org> wrote:

> On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls <jarred@webkit.org> wrote:
>> Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte detection.  My question remains, though: what exactly are you doing?  Do you do zero-byte detection?  Do you do BOM detection?  What's the order of precedence between zero-byte and/or BOM detection, HTTP Content-Type headers, and overrideMimeType if they disagree?  All of this would need to be specified; currently none of it is.
> 
> None of that matters if a specific codec is the one all be all.  If that's the consensus then that's it, period.
> 
> WebKit shares a single text decoder globally for HTML, XML, plain text, etc. the XHR payload runs through it before it would pass to JSON.parse.  Read the code if you're interested.  I would need to change the text decoder to skip BOM detection for this one case unless the spec added that wording of discarding when encoding != UTF-8, then that can be enforced all in XHR with no decoder changes.  I don't want to get hung on explaining WebKit's specific impl. details.
> 
> All of the details I asked about are user-visible, not WebKit implementation details, and would need to be specified if encodings other than UTF-8 were allowed.  I do think this should remain UTF-8 only, but if you want to discuss allowing other encodings, these are things that would need to be defined (which requires a clear proposal, not "read the code").

Of course, I apologize I didn't mean it as a dismissal, I just figured if we are settled on one codec then I'd spare ourselves the time.  I'm also mobile :) I could provide you those details if no decoding changes (enforcement) were done in WebKit, if you'd like.  But since this is a new API, might as well just stick to UTF-8.

> 
> I assume it's not using the exact same decoder logic as HTML.  After all, that would allow non-Unicode encodings.

Not exact, but close.  For discussion's sake and in this context, you could call it the "Unicode" text decoder that does BOM detection and switches Unicode codecs automatically.  For enforced UTF-8 I'd (have to) disable the BOM detection, but additionally could avoid decoding altogether if the specified encoding is not explicitly UTF-8 (and that was a part of the spec).  We'll make it work either way :)

> 
> -- 
> Glenn Maynard
>

Received on Saturday, 7 January 2012 03:03:36 UTC