W3C home > Mailing lists > Public > whatwg@whatwg.org > March 2012

[whatwg] API for encoding/decoding ArrayBuffers into text

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 26 Mar 2012 18:12:22 -0500
Message-ID: <CABirCh_8gSXCykQ5i7_DGE+JZ=JV3ESBy+KWEfS_Hh9exgSh-A@mail.gmail.com>
On Mon, Mar 26, 2012 at 4:49 PM, Joshua Bell <jsbell at chromium.org> wrote:

> * A |stream| option, per the above
>

Does this make sense when you're using stream: false to flush the stream?
It's still a streaming operation.  I guess it's "close enough".

* A |nullTerminator| option eliminates the need for a stringLength method
> (hasta la vista, baby!)
>

I strongly disagree with this change.  It's much cleaner and more generic
for the decoding algorithm to not know anything about null terminators, and
to have separate general-purpose methods to determine the length of the
string (memchr/wmemchr analogs, which we should have anyway).  We made this
simplification a long time ago--why did you resurrect this?

array = new Int8Array(myArrayBuffer);
length = array.indexOf(0); // same semantics as String.indexOf
if(length != -1)
    array = array.subarray(0, length);
new TextDecoder('utf-8').decode(array);

* BOM handling needs to be resolved. The Encoding spec makes the encoding
> label secondary to the BOM. With this API it's unclear if that should be
> the case. Options include having a mismatching BOM throw, treating a
> mismatching BOM as a decoding error (i.e. fallback or throw, depending on
> options), or allow the BOM to actually switch the decoder used for this
> "stream" - possibly if-and-only-if the default encoding was specified.
>

The path of fewest errors is probably to have a BOM override the specified
UTF-16 endianness, so saying "UTF-16BE" just changes the default.


An aside:

The TypedArray constructors have a depressing design bug: new
Int8Array(someOtherView) makes a copy of the data.  It's nonsensical that
view constructors create a view when passed an ArrayBuffer, but a copy when
passed another view.  This doesn't make any kind of sense; creating a view
should create a *view* if it's passed an object that already has
ArrayBuffer-based storage, and making a copy should have been its own
operation.

This means we can't say "creating a view is cheap"; we have to qualify it:
"creating a view is cheap, as long as you're careful not to call a
constructor that makes a copy".

It's frustrating that we're now stuck with a confusing, inconsistent API
like this.  I'm sure it's much too late to fix this properly, but hopefully
an option can be added to fix it, so a new TypedArray(TypedArray, {view:
true}) call  actually creates a view.

-- 
Glenn Maynard
Received on Monday, 26 March 2012 16:12:22 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:40 UTC