[whatwg] API for encoding/decoding ArrayBuffers into text from Glenn Maynard on 2012-03-27 (public-whatwg-archive@w3.org from March 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 26 Mar 2012 19:00:30 -0500
Message-ID: <CABirCh-Gr217k1SuL0Y1gRf0_D4iegb1YEzys_CsV2ZNHNG0xQ@mail.gmail.com>

On Mon, Mar 26, 2012 at 6:27 PM, Jonas Sicking <jonas at sicking.cc> wrote:

> * It appears that we lost the ability to measure how long a resulting
> buffer was going to be and then decode into the buffer. I don't know
> if this is an issue.
>

The theory is that it probably isn't a real performance issue to decode
into a new buffer, then copy it where you want it.  If you think there are
any cases where it matters, we should look at it, though.

The extra GC might matter if you're doing a lot of large conversions, but
that's easily fixed by adding ArrayBuffer.close().

* It might be a performance problem to have to check for the
> fatal/nullTerminator options on each call.
>

Are you thinking of people, say, feeding in a single byte at a time?  That
seems like it'll be slow no matter what.

On Mon, Mar 26, 2012 at 6:40 PM, Joshua Bell <jsbell at chromium.org> wrote:

> > The path of fewest errors is probably to have a BOM override the
> specified
> > UTF-16 endianness, so saying "UTF-16BE" just changes the default.
>
> This would apply on if the previous call had {stream: false} (implicitly or
> explicitly).

Right.  The following two operations should be exactly identical, for every
possible value of str and combination of options, and resulting in a
decoder in the same state:

view1 = decoder.decode(str.substr(0, 8), {stream: true});
view2 = decoder.decode(str.substr(8));
finalView = new Int8Array(view1.length + view2.length);
finalView.set(view1);
finalView.set(view2, view1.length);
return finalView;

return decoder.decode(str);

Calling with {stream:false} would reset for the next call.
>

Right: after a {stream:false} call, a decoder or encoder should be
equivalent to a newly-created one.

Would it apply only to UTF-16 or UTF-8 as well? Should there be any special
> behavior when not specifying an encoding in the constructor?
>

Do you mean, should decoding UTF-8 switch to UTF-16 if it starts with a
UTF-16 BOM?  I think that would be confusing.  If people want to autodetect
UTF-16 like that, they should probably do it themselves.  I think browsers
do this with text/html, but that's just a web-compatibility wart, not a
feature...

-- 
Glenn Maynard

Received on Monday, 26 March 2012 17:00:30 UTC