- From: Joshua Bell <jsbell@chromium.org>
- Date: Fri, 19 Oct 2012 09:26:23 -0700
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: WHATWG <whatwg@whatwg.org>, Glenn Maynard <glenn@zewt.org>
On Thu, Oct 18, 2012 at 1:49 AM, Anne van Kesteren <annevk@annevk.nl> wrote: > I added the API to the Encoding Standard: > > http://encoding.spec.whatwg.org/#api > > Feedback welcome. I suppose we might want to write an introduction for it > too. > > Thanks, Anne! Excellent cleanup, too. On Thu, Oct 11, 2012 at 6:37 PM, Joshua Bell <jsbell@chromium.org> wrote: > > It sounds like there are several desirable behaviors: > > > > 1. ignore BOM handling entirely (BOM would be present in output, or > fatal) > > 2. if matching BOM, consume; otherwise, ignore (mismatching BOM would be > > present in output, or fatal) > > 3. switch encoding based on BOM (any of UTF-8, UTF-16LE, UTF-16BE) > > 4. switch encoding based on BOM if-and-only-if "UTF-16" explicitly > > specified, and only to one of the UTF-16 variants > > I went with supporting just 2 for now. 4 seems weird. > As per IRC discussion, if someone wants to implement this functionality it is fairly simple from script. On Thu, Oct 18, 2012 at 11:24 PM, Anne van Kesteren <annevk@annevk.nl>wrote: > On Thu, Oct 18, 2012 at 4:16 PM, Glenn Maynard <glenn@zewt.org> wrote: > > On Thu, Oct 18, 2012 at 3:54 AM, Anne van Kesteren <annevk@annevk.nl> > wrote: > >> * TextDecoder.decode()'s view argument is no longer optional. Why should > >> it be? > > > > It buffers the "EOF byte" when in streaming mode, eg. when the last byte > of > > the stream is a UTF-8 continuation byte, so any encode errors are > triggered. > > > >> * TextEncoder.encode()'s input argument is no longer nullable. Again, > >> why should it be? > > > > Likewise for encoding, to flush errors for trailing high surrogates. > > I made these arguments optional now (and named them both input). Note > however that the way you get the EOF byte/EOF code point is by > omitting the dictionary (whose stream member defaults to false), but I > can see how not passing any arguments as a final call is convenient. > > > https://github.com/whatwg/encoding/commit/39a201a5cdf43be3d49c6bac7952a0ecb225886b > > Yes, purely convenience. Otherwise you'd need to call: decoder.decode(buffer1, {stream: true}); decoder.decode(buffer2, {stream: true}); decoder.decode(new Uint8Array()); > > >> I also raised the issue of whether TextEncoder should really support > >> utf-16/utf-16be as the encoding standard tries to deprecate non-utf-8 > >> encodings. > > > > The whole point of this API is to support legacy file formats that use > other > > encodings. (It's probably questionable to not support other encodings, > too, > > eg. filenames in ZIP file headers, but starting out with Unicode is > fine.) > > I thought it was mostly about reading legacy formats, but fair enough. > Jonas did a straw poll via Twitter about whether enoding to UTF-16 was needed, and received positive feedback.
Received on Friday, 19 October 2012 16:30:39 UTC