Re: [XHR] UTF-16 - do content sniffing or not?

On Mon, 23 Mar 2015 14:32:27 +0100, Hallvord Reiar Michaelsen Steen  
<hsteen@mozilla.com> wrote:

> On Mon, Mar 23, 2015 at 1:45 PM, Simon Pieters <simonp@opera.com> wrote:
>
>> On Sun, 22 Mar 2015 23:13:20 +0100, Hallvord Reiar Michaelsen Steen <
>> hsteen@mozilla.com> wrote:
>>
>>
>>> Given a server which sends UTF-16 data with a UTF-16 BOM but does *not*
>>> send "charset=UTF-16" in the Content-Type header - should the browser
>>> detect the encoding, or just assume UTF-8 and return mojibake-ish data?
>>>
>>
>
>> What is your test doing? From what I understand of the spec, the result  
>> is
>> different between e.g. responseText (honors utf-16 BOM) and JSON  
>> response
>> (always decodes as utf-8).
>>
>>
> It tests responseText.

OK.

>>> I think the spec currently says one should assume UTF-8 encoding in  
>>> this scenario.

My understanding of the spec is different from yours. Let's step through  
the spec.

https://xhr.spec.whatwg.org/#text-response

[[
Let bytes be response's body.

If bytes is null, return the empty string.

Let charset be the final charset.
]]

final charset is null.

[[
If responseType is the empty string, charset is null, and final MIME type  
is either null, text/xml, application/xml or ends in +xml, use the rules  
set forth in the XML specifications to determine the encoding. Let charset  
be the determined encoding. [XML] [XMLNS]
]]

Which MIME type did you use in the response? BOM sniffing in XML is  
non-normative IIRC. For other types, see below.

[[
If charset is null, set charset to utf-8.

Return the result of running decode on byte stream bytes using fallback  
encoding charset.
]]

->
https://encoding.spec.whatwg.org/#decode

[[
For each of the rows in the table below, starting with the first one and  
going down, if the first bytes of buffer match all the bytes given in the  
first column, then set encoding to the encoding given in the cell in the  
second column of that row and set BOM seen flag.
]]

This step honors the BOM. The fallback encoding is ignored.

-- 
Simon Pieters
Opera Software

Received on Monday, 23 March 2015 14:15:12 UTC