- From: Glenn Maynard <glenn@zewt.org>
- Date: Tue, 20 Mar 2012 09:26:44 -0500
On Mon, Mar 19, 2012 at 11:52 PM, Jonas Sicking <jonas at sicking.cc> wrote: > Why are encodings different than other parts of the API where you > indeed have to know what works and what doesn't. > Do you memorize lists of encodings? I certainly don't. I look them up as needed. UTF8 is stateful, so I disagree. > No, UTF-8 doesn't require a stateful decoder to support streaming. You decode up to the last codepoint that you can decode completely. The return values are the output data, the number of bytes output, and the number of bytes consumed; that's all you need to restart decoding later. That's the iconv(3) approach that we're probably all familiar with, which works with almost all encodings. ISO-2022 encodings are stateful: you have to persistently remember the character subsets activated by earlier escape sequences. An iconv-like streaming API is impossible; to support streamed decoding, you'd need to have a decoder object that the user keeps around in order to store that state. http://en.wikipedia.org/wiki/ISO/IEC_2022#Code_structure -- Glenn Maynard
Received on Tuesday, 20 March 2012 07:26:44 UTC