[whatwg] API for encoding/decoding ArrayBuffers into text from Glenn Maynard on 2012-03-20 (public-whatwg-archive@w3.org from March 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Tue, 20 Mar 2012 09:26:44 -0500
Message-ID: <CABirCh_vn08O=_xcxzrNT_8hSUUdwPn03_Ncg==pB11gvHaP0w@mail.gmail.com>

On Mon, Mar 19, 2012 at 11:52 PM, Jonas Sicking <jonas at sicking.cc> wrote:

> Why are encodings different than other parts of the API where you
>
indeed have to know what works and what doesn't.
>

Do you memorize lists of encodings?  I certainly don't.  I look them up as
needed.

UTF8 is stateful, so I disagree.
>

No, UTF-8 doesn't require a stateful decoder to support streaming.  You
decode up to the last codepoint that you can decode completely.  The return
values are the output data, the number of bytes output, and the number of
bytes consumed; that's all you need to restart decoding later.  That's the
iconv(3) approach that we're probably all familiar with, which works with
almost all encodings.

ISO-2022 encodings are stateful: you have to persistently remember the
character subsets activated by earlier escape sequences.  An iconv-like
streaming API is impossible; to support streamed decoding, you'd need to
have a decoder object that the user keeps around in order to store that
state.  http://en.wikipedia.org/wiki/ISO/IEC_2022#Code_structure

-- 
Glenn Maynard

Received on Tuesday, 20 March 2012 07:26:44 UTC