- From: Joshua Bell <jsbell@chromium.org>
- Date: Mon, 17 Sep 2012 14:13:08 -0700
- To: WHAT Working Group <whatwg@lists.whatwg.org>
On Fri, Aug 17, 2012 at 5:19 PM, Jonas Sicking <jonas@sicking.cc> wrote: > On Fri, Aug 17, 2012 at 7:15 AM, Glenn Maynard <glenn@zewt.org> wrote: > > On Fri, Aug 17, 2012 at 2:23 AM, Jonas Sicking <jonas@sicking.cc> wrote: > >> > >> > - If encoding is "utf-16" and the first bytes match 0xFF 0xFE or > >> > 0xFE > >> > 0xFF then set current encoding to "utf-16" or "utf-16be" > >> > respectively and > >> > advance the stream past the BOM. The current encoding is used > >> > until the > >> > stream is reset. > >> > - Otherwise, if the first bytes match 0xFF 0xFE, 0xFE 0xFF, or > >> > 0xEF > >> > 0xBB 0xBF then set current encoding to "utf-16", "utf-16be" or > >> > "utf-8" > >> > respectively and advance the stream past the BOM. The current > >> > encoding is > >> > used until the stream is reset. > >> > >> This doesn't sound right. The effect of the rules so far would be that > >> if you create a decoder and specify "utf-16" as encoding, and the > >> first bytes in the stream are 0xEF 0xBB 0xBF you'd silently switch to > >> "utf-8" decoding. > > > > I think the scope of the "otherwise" is unclear, and this is meant to be > > "otherwise (if encoding is not "utf-16")". > > Ah, that would make sense. It effectively means "if encoding is not set". > > / Jonas > I've attempted to distill the above into the spec in an algorithmic way: http://wiki.whatwg.org/wiki/StringEncoding#TextDecoder English version: If you specify "utf-16" you get endian-agnostic UTF-16 encoding support. Failing that, if your encoding matches your BOM it is consumed. Failing *that*, you get whatever behavior falls out of the decode algorithm (garbage, error, etc). The JS shim has *not* been updated yet. Only part of this edit has been live for the last few weeks - apologies to the Moz folks who were trying to understand what the half-specified internal useBOM flag was for. Any implementer feedback so far?
Received on Monday, 17 September 2012 21:13:35 UTC