- From: Glenn Maynard <glenn@zewt.org>
- Date: Wed, 15 Aug 2012 19:30:09 -0500
- To: Joshua Bell <jsbell@chromium.org>
- Cc: WHAT Working Group <whatwg@lists.whatwg.org>
On Tue, Aug 14, 2012 at 12:34 PM, Joshua Bell <jsbell@chromium.org> wrote: > - Create an encoder with TextDecoder() and if present a BOM will be > respected (and consumed) otherwise default to UTF-8 > Let's not default to "autodetect Unicode formats". It encourages people to support UTF-16 when they may not mean to. If BOM detection for both UTF-8 and UTF-16 is wanted, I'd suggest something explicit, like "utf-*". If the argument to the ctor is optional, I think the default should be purely UTF-8. > This gets easier if we restrict to encoding UTF-8 which typically doesn't > include BOMs. But it's looking like there's enough desire to keep UTF-16 > encoding at the moment. Agree with just stripping it for now. > UTF-8 sometimes does have a BOM, especially in Windows where applications sometimes use it to distinguish UTF-8 from ACP text files (which are just as common as ever--Windows has made no motion away from legacy encodings whatsoever). Stripping the BOM can cause those applications to misinterpret the files as ACP. Anyway, even if the encoding API gives a "helper" for this, figuring out how that works would probably be more effort for developers than just peeking at the ArrayBuffer for the BOM and adding it back in manually. (I'm pretty sure anybody who knows enough to pay attention to this in the first place will have no trouble doing that.) So, yeah, let's not worry about this. -- Glenn Maynard
Received on Thursday, 16 August 2012 00:30:38 UTC