- From: Joshua Bell <jsbell@chromium.org>
- Date: Tue, 20 Mar 2012 10:39:12 -0700
On Tue, Mar 20, 2012 at 7:26 AM, Glenn Maynard <glenn at zewt.org> wrote: > On Mon, Mar 19, 2012 at 11:52 PM, Jonas Sicking <jonas at sicking.cc> wrote: > >> Why are encodings different than other parts of the API where you >> > indeed have to know what works and what doesn't. >> > > Do you memorize lists of encodings? I certainly don't. I look them up as > needed. > > UTF8 is stateful, so I disagree. >> > > No, UTF-8 doesn't require a stateful decoder to support streaming. You > decode up to the last codepoint that you can decode completely. The return > values are the output data, the number of bytes output, and the number of > bytes consumed; that's all you need to restart decoding later. That's the > iconv(3) approach that we're probably all familiar with, which works with > almost all encodings. > > ISO-2022 encodings are stateful: you have to persistently remember the > character subsets activated by earlier escape sequences. An iconv-like > streaming API is impossible; to support streamed decoding, you'd need to > have a decoder object that the user keeps around in order to store that > state. http://en.wikipedia.org/wiki/ISO/IEC_2022#Code_structure > Which seems like it leaves us with these options: 1. Only support encodings with stateless coding (possibly down to a minimum of UTF-8) 2. Only provide an API supporting non-streaming coding (i.e. whole strings/whole buffers) 3. Expand the API to return encoder/decoder objects that capture state Any others? Trying to do simplify the problem but take on both (1) and (2) without (3) would lead to an API that could not encompass (3) in the future, which would be a mistake. I'll throw out that the in-progress design of a Globalization API for ECMAScript - http://norbertlindenberg.com/2012/02/ecmascript-internationalization-api/ - is currently spec'd to both build on the existing locale-aware methods on String/Number/Date prototypes as conveniences, as well as introducing the Collator and *Format objects. Should we start with UTF-8-only/non-streaming methods on DOMString/ArrayBufferView, and avoid constraining a future API supporting multiple, possibly stateful encodings and streaming?
Received on Tuesday, 20 March 2012 10:39:12 UTC