Re: [XHR] responseType "json" from Glenn Maynard on 2011-12-05 (public-webapps@w3.org from October to December 2011)

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 5 Dec 2011 13:45:03 -0500
To: Glenn Adams <glenn@skynav.com>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, WebApps WG <public-webapps@w3.org>
Message-ID: <CABirCh-BFeoNxTmjTRROLyJPeqpzM4nnfh7JLLLaB7gCKHiS_w@mail.gmail.com>

On Mon, Dec 5, 2011 at 1:00 PM, Glenn Adams <glenn@skynav.com> wrote:

> > [2] http://www.w3.org/TR/charmod/#C030
>
>>
>> No, it wouldn't.  That doesn't say that UTF-32 must be recognized.
>
>
> You misread me. I am not saying or supporting that UTF-32 must be
> recognized. I am saying that MIS-recognizing UTF-32 as UTF-16 violates [2].
>

It's impossible to violate that rule if the encoding isn't recognized.
"When an IANA-registered charset name *is recognized*"; UTF-32 isn't
recognized, so this is irrelevant.

If a browser doesn't support UTF-32 as an incoming interchange format, then
> it should treat it as any other character encoding it does not recognize.
> It must not pretend it is another encoding.
>

When an encoding is not recognized by the browser, the browser has full
discretion in guessing the encoding.  (See step 7 of
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding.)
It's perfectly reasonable for UTF-32 data to be detected as UTF-16.  For
example, UTF-32 data is likely to contain null bytes when scanned bytewise,
and UTF-16 is the only supported encoding where that's likely to happen.
Steps 7 and 8 gives browsers unrestricted freedom in selecting the encoding
when the previous steps are unable to do so; if they choose to include "if
the charset is declared as UTF-32, return UTF-16" as one of their
autodetection rules, the spec allows it.

-- 
Glenn Maynard

Received on Monday, 5 December 2011 18:45:41 UTC