Re: [whatwg] Encoding: API from Joshua Bell on 2012-10-19 (public-whatwg-archive@w3.org from October 2012)

From: Joshua Bell <jsbell@chromium.org>
Date: Fri, 19 Oct 2012 09:26:23 -0700
To: Anne van Kesteren <annevk@annevk.nl>
Cc: WHATWG <whatwg@whatwg.org>, Glenn Maynard <glenn@zewt.org>
Message-ID: <CAD649j7CR-dndHzPt2R+j3fVX1rQ-idLaUvETbRjkSP2NDcvzQ@mail.gmail.com>

On Thu, Oct 18, 2012 at 1:49 AM, Anne van Kesteren <annevk@annevk.nl> wrote:

> I added the API to the Encoding Standard:
>
>   http://encoding.spec.whatwg.org/#api
>
> Feedback welcome. I suppose we might want to write an introduction for it
> too.
>
>
Thanks, Anne! Excellent cleanup, too.


On Thu, Oct 11, 2012 at 6:37 PM, Joshua Bell <jsbell@chromium.org> wrote:
> > It sounds like there are several desirable behaviors:
> >
> > 1. ignore BOM handling entirely (BOM would be present in output, or
> fatal)
> > 2. if matching BOM, consume; otherwise, ignore (mismatching BOM would be
> > present in output, or fatal)
> > 3. switch encoding based on BOM (any of UTF-8, UTF-16LE, UTF-16BE)
> > 4. switch encoding based on BOM if-and-only-if "UTF-16" explicitly
> > specified, and only to one of the UTF-16 variants
>
> I went with supporting just 2 for now. 4 seems weird.
>

As per IRC discussion, if someone wants to implement this functionality it
is fairly simple from script.


On Thu, Oct 18, 2012 at 11:24 PM, Anne van Kesteren <annevk@annevk.nl>wrote:

> On Thu, Oct 18, 2012 at 4:16 PM, Glenn Maynard <glenn@zewt.org> wrote:
> > On Thu, Oct 18, 2012 at 3:54 AM, Anne van Kesteren <annevk@annevk.nl>
> wrote:
> >> * TextDecoder.decode()'s view argument is no longer optional. Why should
> >> it be?
> >
> > It buffers the "EOF byte" when in streaming mode, eg. when the last byte
> of
> > the stream is a UTF-8 continuation byte, so any encode errors are
> triggered.
> >
> >> * TextEncoder.encode()'s input argument is no longer nullable. Again,
> >> why should it be?
> >
> > Likewise for encoding, to flush errors for trailing high surrogates.
>
> I made these arguments optional now (and named them both input). Note
> however that the way you get the EOF byte/EOF code point is by
> omitting the dictionary (whose stream member defaults to false), but I
> can see how not passing any arguments as a final call is convenient.
>
>
> https://github.com/whatwg/encoding/commit/39a201a5cdf43be3d49c6bac7952a0ecb225886b
>
> Yes, purely convenience. Otherwise you'd need to call:

decoder.decode(buffer1, {stream: true});
decoder.decode(buffer2, {stream: true});
decoder.decode(new Uint8Array());


>
> >> I also raised the issue of whether TextEncoder should really support
> >> utf-16/utf-16be as the encoding standard tries to deprecate non-utf-8
> >> encodings.
> >
> > The whole point of this API is to support legacy file formats that use
> other
> > encodings.  (It's probably questionable to not support other encodings,
> too,
> > eg. filenames in ZIP file headers, but starting out with Unicode is
> fine.)
>
> I thought it was mostly about reading legacy formats, but fair enough.
>

Jonas did a straw poll via Twitter about whether enoding to UTF-16 was
needed, and received positive feedback.

Received on Friday, 19 October 2012 16:30:39 UTC