[whatwg] API for encoding/decoding ArrayBuffers into text from Glenn Maynard on 2012-03-16 (public-whatwg-archive@w3.org from March 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Thu, 15 Mar 2012 19:20:26 -0500
Message-ID: <CABirCh9tjDqBv_ncku=-BV4+ZmNnQhhufqW4fPi2BtmUaoL8zQ@mail.gmail.com>

On Thu, Mar 15, 2012 at 6:51 PM, Jonas Sicking <jonas at sicking.cc> wrote:

> What's the use-case for the "stringLength" function? You can't decode
> into an existing datastructure anyway, so you're ultimately forced to
> call "decode" at which point the "stringLength" function hasn't helped
> you.
>

stringLength doesn't return the length of the decoded string.  It returns
the byte offset of the first \0 (or the length of the whole buffer, if
none), for decoding null-terminated strings.  For multibyte encodings (eg.
everything except UTF-16 and friends), it's just memchr(), so it's much
faster than actually decoding the string.

Currently the use-case of simply wanting to convert a string to a
> binary buffer is a bit cumbersome. You first have to call the
> "encodedLength" function, then allocate a buffer of the right size,
> then call the "encode" function.

I suggested eg.

result = encode("string", "utf-8", null).output;

which would create an ArrayBuffer of the required size.  Presumably the
null ArrayBufferView argument would be optional, so you could just say
encode("string", "utf-8").

It doesn't seem possible to implement the 'encode' function without
> doing multiple scans over the string. The implementation seems
> required both to check that the data can be decoded using the
> specified encoding, as well as check that the data will fit in the
> passed in buffer. Only then can the implementation start decoding the
> data. This seems problematic.
>

Only if it guarantees that it doesn't write anything to the output buffer
unless the entire result will fit.  I don't think we need to do that; just
guarantee that it'll be truncated on a whole codepoint.

I also don't think it's a good idea to throw an exception for encoding
> errors. Better to convert characters to the unicode replacement
> character. I believe we made a similar change to the WebSockets
> specification recently.
>

Was that change made?  I filed
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16157, but it still seems to
be undecided.

-- 
Glenn Maynard

Received on Thursday, 15 March 2012 17:20:26 UTC