- From: Joshua Bell <jsbell@chromium.org>
- Date: Thu, 16 Aug 2012 16:54:43 -0700
- To: Glenn Maynard <glenn@zewt.org>
- Cc: WHAT Working Group <whatwg@lists.whatwg.org>
On Wed, Aug 15, 2012 at 5:30 PM, Glenn Maynard <glenn@zewt.org> wrote: > On Tue, Aug 14, 2012 at 12:34 PM, Joshua Bell <jsbell@chromium.org> wrote: > >> - Create an encoder with TextDecoder() and if present a BOM will be >> >> respected (and consumed) otherwise default to UTF-8 >> > > Let's not default to "autodetect Unicode formats". It encourages people > to support UTF-16 when they may not mean to. If BOM detection for both > UTF-8 and UTF-16 is wanted, I'd suggest something explicit, like "utf-*". > > If the argument to the ctor is optional, I think the default should be > purely UTF-8. > Works for me. In the algorithm specified in the email, this simply removes the clause "If encoding is not specified, set an internal useBOM flag" - namely, only "utf-16" gets the useBOM flag. I'll attempt to wedge this into the spec soon. > This gets easier if we restrict to encoding UTF-8 which typically doesn't >> include BOMs. But it's looking like there's enough desire to keep UTF-16 >> encoding at the moment. Agree with just stripping it for now. >> > > UTF-8 sometimes does have a BOM, especially in Windows where applications > sometimes use it to distinguish UTF-8 from ACP text files (which are just > as common as ever--Windows has made no motion away from legacy encodings > whatsoever). > Good point. Ah, Notepad, my old friend... > Stripping the BOM can cause those applications to misinterpret the files > as ACP. > > Anyway, even if the encoding API gives a "helper" for this, figuring out > how that works would probably be more effort for developers than just > peeking at the ArrayBuffer for the BOM and adding it back in manually. > (I'm pretty sure anybody who knows enough to pay attention to this in the > first place will have no trouble doing that.) So, yeah, let's not worry > about this. > > -- > Glenn Maynard > >
Received on Thursday, 16 August 2012 23:55:14 UTC