- From: Joshua Bell <jsbell@chromium.org>
- Date: Mon, 17 Sep 2012 14:50:46 -0700
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: WHAT Working Group <whatwg@lists.whatwg.org>
On Mon, Sep 17, 2012 at 2:17 PM, Anne van Kesteren <annevk@annevk.nl> wrote: > On Mon, Sep 17, 2012 at 11:13 PM, Joshua Bell <jsbell@chromium.org> wrote: > > I've attempted to distill the above into the spec in an algorithmic way: > > http://wiki.whatwg.org/wiki/StringEncoding#TextDecoder > > > > English version: If you specify "utf-16" you get endian-agnostic UTF-16 > > encoding support. Failing that, if your encoding matches your BOM it is > > consumed. Failing *that*, you get whatever behavior falls out of the > decode > > algorithm (garbage, error, etc). > > Why would we want the API to work different from how it works in > markup (with <meta charset> etc.)? Granted it's not super logical, but > I don't really see why we should make it inconsistent and more > complicated. > That's how the spec started out, so a recap of this thread would give you the back-and-forth that led here. To summarize: Having the BOM in the content be higher priority than the coding selected by the developer was not seen as desirable (see earlier in the thread), and potentially a source of errors. Selecting encoding via BOM (in general, or to emulate <meta charset>, etc) was seen as something that could be done in user code if desired, but unexpected otherwise. Two desired behaviors remained: (1) developer need for BOM-specified endian-agnostic UTF-16 encoding similar to ICU's handling that distinguishes "utf-16" from "utf-16le", and (2) that matching BOMs should be consumed and not appear in the decoded data.
Received on Monday, 17 September 2012 21:51:15 UTC