[XHR] UTF-16 - do content sniffing or not? from Hallvord Reiar Michaelsen Steen on 2015-03-22 (public-webapps@w3.org from January to March 2015)

From: Hallvord Reiar Michaelsen Steen <hsteen@mozilla.com>
Date: Sun, 22 Mar 2015 23:13:20 +0100
To: WebApps WG <public-webapps@w3.org>
Message-ID: <CAE3JC2w-crajn3k=HWPv+0iG0Kvw9p+=zzk_X4VGJydA2L9pqQ@mail.gmail.com>

Hi,
I've just added a test loading UTF-16 data with XHR, and it exposes an
implementation difference that should probably be discussed:

Given a server which sends UTF-16 data with a UTF-16 BOM but does *not*
send "charset=UTF-16" in the Content-Type header - should the browser
detect the encoding, or just assume UTF-8 and return mojibake-ish data?

Per my test, Chrome detects the UTF-16 encoding while Gecko doesn't. I
think the spec currently says one should assume UTF-8 encoding in this
scenario. Are WebKit/Blink - developers OK with changing their
implementation?

(The test currently asserts detecting UTF-16 is correct, pending discussion
and clarification.)

-Hallvord

Received on Sunday, 22 March 2015 22:13:54 UTC