Re: [XHR] responseType "json" from Glenn Maynard on 2011-12-05 (public-webapps@w3.org from October to December 2011)

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 5 Dec 2011 17:28:34 -0500
To: Glenn Adams <glenn@skynav.com>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, WebApps WG <public-webapps@w3.org>
Message-ID: <CABirCh-Fegb7cLoV8zZkrifGHE8+o3L1cdmTFsz-X5-=1ZdZ6Q@mail.gmail.com>

On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams <glenn@skynav.com> wrote:

> But, if the browser does not support UTF-32, then the table in step (4) of
> [1] is supposed to apply, which would interpret the initial two bytes FF FE
> as UTF-16LE according to the current language of [1], and further, return a
> confidence level of "certain".
>
> I see the problem now. It seems that the table in step (4) should be
> changed to interpret an initial FF FE as UTF-16BE only if the following two
> bytes are not 00.
>

That wouldn't actually bring browsers and the spec closer together; it
would actually bring them further apart.

At first glance, it looks like it makes the spec allow WebKit and IE's
behavior, which (unfortunately) includes UTF-32 detection, by allowing them
to fall through to step 7, where they're allowed to detect things however
they want.

However, that's ignoring step 5.  If step 4 passes through, then step 5
would happen next.  That means this carefully-constructed file would be
detected as UTF-8 by step 5:

http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding

That's not what happens in any browser; FF detects it as UTF-16 and WebKit
and IE detect it as UTF-32.  This change would require it to be detected as
UTF-8, which would have security implications if implemented, eg. a page
outputting escaped user-inputted text in UTF-32 might contain a string like
this, followed by a hostile <script>, when interpreted as UTF-8.

This really isn't worth spending time on; you've free to press this if you
like, but I'm moving on.

-- 
Glenn Maynard

Received on Monday, 5 December 2011 22:29:12 UTC