W3C home > Mailing lists > Public > whatwg@whatwg.org > March 2012

[whatwg] API for encoding/decoding ArrayBuffers into text

From: NARUSE, Yui <naruse@airemix.jp>
Date: Wed, 21 Mar 2012 17:34:38 +0900
Message-ID: <CAK6HhsoBGAufvwphdazHUBtZNba_dqtQgYxBVaOyMUWucHDN6g@mail.gmail.com>
2012/3/21 Glenn Maynard <glenn at zewt.org>:
> On Tue, Mar 20, 2012 at 12:39 PM, Joshua Bell <jsbell at chromium.org> wrote:
>
>> 1. Only support encodings with stateless coding (possibly down to a minimum
>> of UTF-8)
>> 2. Only provide an API supporting non-streaming coding (i.e. whole
>> strings/whole buffers)
>> 3. Expand the API to return encoder/decoder objects that capture state
>>
>> Any others?
>>
>> Trying to do simplify the problem but take on both (1) and (2) without (3)
>> would lead to an API that could not encompass (3) in the future, which
>> would be a mistake.
>
> I don't think that's obviously a mistake. ?Only the nastiest, wartiest of
> legacy encodings require it.

The categories feels strange.

If the conversion is not streaming (whole strings/whole buffers), its
implementation should be simply the wrapper of the browser's
conversion functions.
There is no need to a state object to save the state because the conversion
is done with the completion of the function, even if it is stateful encoding.

For streaming conversion, it needs state even if the encoding is stateless.
When the given partial input is finished at the middle of a character
like "\xE3\x81\x82\xC2", the conversion consumes 4 bytes, output one character
"\u3042", and remember the partial bytes "\xC2". This bytes is the state.

> That said, it's fairly simple to later return an additional state object
> from the previously proposed streaming APIs, eg.
>
> result = decode(str, 0, outputView)
> // result.outputBytes == 15
> // result.nextInputByte == 5
> // result.state == opaque object
>
> result2 = decode(str, result.nextInputByte, outputView, {state:
> result.state});

You can refer mbsrtowcs(3), which convert a character string to a wide-character
string (restartable). It uses opaque state.
size_t mbsnrtowcs(wchar_t *restrict dst, const char **restrict src,
       size_t nmc, size_t len, mbstate_t *restrict ps);
http://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsrtowcs.html

Anyway, they need error if the byte sequence is invalid for the encoding.

-- 
NARUSE, Yui ?<naruse at airemix.jp>
Received on Wednesday, 21 March 2012 01:34:38 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:40 UTC