[whatwg] API for encoding/decoding ArrayBuffers into text from Glenn Maynard on 2012-03-16 (public-whatwg-archive@w3.org from March 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Fri, 16 Mar 2012 12:35:55 -0500
Message-ID: <CABirCh-i+DDL0Jwh9uoSQ45Oeev4gkAfOPmTWvTaRs5yS143RA@mail.gmail.com>

On Fri, Mar 16, 2012 at 11:19 AM, Joshua Bell <jsbell at chromium.org> wrote:

> And just to be clear, the use case is decoding data formats where string
> fields are variable length null terminated.
>

A concrete example is ZIP central directories.

 I think we want both encoding and destination to be optional. That leads us
> to an API like:
>
> out_dict = stringEncoding.encode("string", opt_dict);
>
> .. where both out_dict and opt_dict are WebIDL Dictionaries:
>
> opt_dict keys: view, encoding
>

> out_dict keys: charactersWritten, byteWritten, output
>

The return value should just be a [NoInterfaceObject] interface.
Dictionaries are used for input fields.

Something that came up on IRC that we should spend some time thinking
about, though: Is it actually important to be able to encode into an
existing buffer?  This may be a premature optimization.  You can always
encode into a new buffer, and--if needed--copy the result where you need it.

If we don't support that, most of this extra stuff in encode() goes away.

... where output === view if view is supplied, otherwise a new Uint8Array
> (or Uint8ClampedArray??)
>

Uint8Array is correct.  (Uint8ClampedArray is for image color data.)

If UTF-16 or UTF-32 are supported, decoding to them should return
Uint16Array and Uint32Array, respectively (with the return value being
typed just to ArrayBufferView).

If this instead is attached to String, it would look like:
>
> out_dict = my_string.encode(opt_dict);
>
> If it were attached to ArrayBufferView, having a right-size buffer
> allocated for the caller gets uglier unless we include a static version.
>

If in-place decoding isn't really needed, we could have:

newView = str.encode("utf-8"); // or {encoding: "utf-8"}
str2 = newView.decode("utf-8");
len = newView.find(0); // replaces stringLength, searching for 0 in the
view's type; you'd use Uint16Array for UTF-16

and encodedLength() would go away.

newView.find(val) would live on subclasses of TypedArray.

In the create-a-buffer-on-the-fly case there will be some memory juggling
> going on, either by initially over allocating or reallocating/moving.
>

But since that's all behind the scenes, the implementation can do it
whichever way is most efficient for the particular encoding.  In many
cases, it may be possible to eliminate any reallocation, by making an
educated guess about how big the buffer is likely to be.

On Fri, Mar 16, 2012 at 11:21 AM, Joshua Bell <jsbell at chromium.org> wrote:

> ... and the spec should include normative guidance that length-prefixing is
> strongly recommended for new data formats.
>

I think this would be a bit off-topic.

-- 
Glenn Maynard

Received on Friday, 16 March 2012 10:35:55 UTC