W3C home > Mailing lists > Public > whatwg@whatwg.org > April 2012

[whatwg] API for encoding/decoding ArrayBuffers into text

From: Joshua Bell <jsbell@chromium.org>
Date: Wed, 4 Apr 2012 09:09:12 -0700
Message-ID: <CAD649j6PFq6ATUiNBAaqngrMRvG0eBPt97fOyrStFapLidAz4g@mail.gmail.com>
Any further input on Kenneth's suggestions?

Re: ArrayBufferView vs. DataView - I'm tempted to make the switch to just
DataView. As discussed below, data parsing/serialization operations will
tend to be associated with DataViews. As Glenn has mentioned elsewhere
recently, it is possible to accidentally do a buffer copy when mis-using
typed array constructors, while DataView avoids this. DataViews are cheap
to construct, and when I'm writing sample code for the proposed API I find
I create throw-away DataViews anyway. Also, there is the potential for
confusion when using a non-Uint8Array buffer e.g. are the elements being
decoded using array[N] as the octets or using the underlying buffer? for
Uint16Array/UTF-16 encodings, what are the endianness concerns? DataView
APIs have an explicit endianness and no index getter, which alleviates this

Re: writing into an existing buffer - as Glenn says, most of the input
earlier in the thread advocated strongly for very simple initial API with
streaming support as the only "fancy" feature beyond the minimal string =
foo.decode(buffer) / buffer = foo.encode(string). Adding details =
foo.encodeInto(string, buffer) later on is not precluded if there is demand.

Also, I am planning to move the "fatal" option from the encode/decode
methods to the TextEncoder/TextDecoder constructors. Objections?

On Tue, Mar 27, 2012 at 7:43 PM, Kenneth Russell <kbr at google.com> wrote:

> On Tue, Mar 27, 2012 at 6:44 PM, Glenn Maynard <glenn at zewt.org> wrote:
> > On Tue, Mar 27, 2012 at 7:12 PM, Kenneth Russell <kbr at google.com> wrote:
> >>
> >>   - I think it should reference DataView directly rather than
> >> ArrayBufferView. The typed array spec was specifically designed with
> >> two use cases in mind: in-memory assembly of data to be sent to the
> >> graphics card or audio device, where the byte order must be that of
> >> the host architecture;
> >
> >
> > This is wrong, broken, won't be implemented this way by any production
> > browser, isn't how it's used in practice, and needs to be fixed in the
> > spec.  It violates the most basic web API requirement: interoperability.
> > Please see earlier in the thread; the views affected by endianness need
> to
> > be specced as little endian.  That's what everyone is going to implement,
> > and what everyone's pages are going to depend on, so it's what the spec
> > needs to say.  Separate types should be added for big-endian (eg.
> > Int16BEArray).
> Thanks for your input.
> The design of the typed array classes was informed by requirements
> about how the OpenGL, and therefore WebGL, API work; and from prior
> experience with the design and implementation of Java's New I/O Buffer
> classes, which suffered from horrible performance pitfalls because of
> a design similar to that which you suggest.
> Production browsers already implement typed arrays with their current
> semantics. It is not possible to change them and have WebGL continue
> to function. I will go so far as to say that the semantics will not be
> changed.
> In the typed array specification, unlike Java's New I/O specification,
> the API was split between two use cases: in-memory data construction
> (for consumption by APIs like WebGL and Web Audio), and file and
> network I/O. The API was carefully designed to avoid roadblocks that
> would prevent maximum performance from being achieved for these use
> cases. Experience has shown that the moment an artificial performance
> barrier is imposed, it becomes impossible to build certain kinds of
> programs. I consider it unacceptable to prevent developers from
> achieving their goals.
> > I also disagree that it should use DataView.  Views are used to access
> > arrays (including strings) within larger data structures.  DataView is
> used
> > to access packed data structures, where constructing a view for each
> > variable in the struct is unwieldy.  It might be useful to have a helper
> in
> > DataView, but the core API should work on views.
> This is one point of view. The true design goal of DataView is to
> supply the primitives for fast file and network input/output, where
> the endianness is explicitly specified in the file format. Converting
> strings to and from binary encodings is obviously an operation
> associated with transfer of data to or from files or the network.
> According to this taxonomy, the string encoding and decoding
> operations should only be associated with DataView, and not the other
> typed array types, which are designed for in-memory data assembly for
> consumption by other hardware on the system.
> >>  - It would be preferable if the encoding API had a way to avoid
> >> memory allocation, for example to encode into a passed-in DataView.
> >
> >
> > This was an earlier design, and discussion led to it being removed as a
> > premature optimization, to simplify the API.  I'd recommend reading the
> rest
> > of the thread.
> I do apologize for not being fully caught up on the thread, but hope
> that the input above was still useful.
> -Ken
