Re: [XHR] responseType "json" from Glenn Adams on 2011-12-05 (public-webapps@w3.org from October to December 2011)

From: Glenn Adams <glenn@skynav.com>
Date: Mon, 5 Dec 2011 16:07:13 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: Glenn Maynard <glenn@zewt.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, WebApps WG <public-webapps@w3.org>
Message-ID: <CACQ=j+cxrgetcQJ8JoySSa7oHgEUP8_qL0vd5EzCB+LP0WT67w@mail.gmail.com>

The problem as I see it is that the current spec text for charset detection
effectively *requires* a browser that does not "support" UTF-32 to
explicitly ignore content metadata that may be correct (if it specifies
UTF-32 as charset param), and further, to explicitly mis-label such content
as UTF-16LE in the case that the first four bytes are FF FE 00 00. Indeed,
the current algorithm requires mis-labelling such content as UTF-16LE with
a confidence of "certain".

The current text is also ambiguous with respect to what "support" means in
step (2) of Section 8.2.2.1 of [1]. Which of the following are meant by
"support"?

   - recognize with sniffer
   - be capable of using directly as internal coding
   - be capable of transcoding to internal coding

[1]
http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding

On Mon, Dec 5, 2011 at 3:10 PM, Ian Hickson <ian@hixie.ch> wrote:

> On Mon, 5 Dec 2011, Glenn Adams wrote:
> >
> > I see the problem now. It seems that the table in step (4) should be
> > changed to interpret an initial FF FE as UTF-16BE only if the following
> > two bytes are not 00.
>
> The current text is intentional. UTF-32 is explicitly not supported by the
> HTML standard.
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>

Received on Monday, 5 December 2011 23:08:11 UTC