- From: Glenn Maynard <glenn@zewt.org>
- Date: Mon, 5 Dec 2011 13:45:03 -0500
- To: Glenn Adams <glenn@skynav.com>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, WebApps WG <public-webapps@w3.org>
- Message-ID: <CABirCh-BFeoNxTmjTRROLyJPeqpzM4nnfh7JLLLaB7gCKHiS_w@mail.gmail.com>
On Mon, Dec 5, 2011 at 1:00 PM, Glenn Adams <glenn@skynav.com> wrote: > > [2] http://www.w3.org/TR/charmod/#C030 > >> >> No, it wouldn't. That doesn't say that UTF-32 must be recognized. > > > You misread me. I am not saying or supporting that UTF-32 must be > recognized. I am saying that MIS-recognizing UTF-32 as UTF-16 violates [2]. > It's impossible to violate that rule if the encoding isn't recognized. "When an IANA-registered charset name *is recognized*"; UTF-32 isn't recognized, so this is irrelevant. If a browser doesn't support UTF-32 as an incoming interchange format, then > it should treat it as any other character encoding it does not recognize. > It must not pretend it is another encoding. > When an encoding is not recognized by the browser, the browser has full discretion in guessing the encoding. (See step 7 of http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding.) It's perfectly reasonable for UTF-32 data to be detected as UTF-16. For example, UTF-32 data is likely to contain null bytes when scanned bytewise, and UTF-16 is the only supported encoding where that's likely to happen. Steps 7 and 8 gives browsers unrestricted freedom in selecting the encoding when the previous steps are unable to do so; if they choose to include "if the charset is declared as UTF-32, return UTF-16" as one of their autodetection rules, the spec allows it. -- Glenn Maynard
Received on Monday, 5 December 2011 18:45:41 UTC