- From: Joshua Bell <jsbell@chromium.org>
- Date: Tue, 13 Mar 2012 16:50:43 -0700
On Tue, Mar 13, 2012 at 4:28 PM, Ian Hickson <ian at hixie.ch> wrote: > On Tue, 13 Mar 2012, Joshua Bell wrote: > > On Tue, Mar 13, 2012 at 4:10 PM, Jonas Sicking <jonas at sicking.cc> wrote: > > > On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell <kbr at google.com> > > > wrote: > > > > Joshua Bell has been working on a string encoding and decoding API > > > > that supports the needed encodings, and which is separable from the > > > > core typed array API: > > > > > > > > http://wiki.whatwg.org/wiki/StringEncoding > > > > > > > > This is the direction I prefer. String encoding and decoding seems > > > > to be a complex enough problem that it should be expressed > > > > separately from the typed array spec itself. > > Some quick feedback: > > - [OmitConstructor] doesn't seem to be WebIDL > Historically, the spec started off as an addition to the Typed Array spec that splintered off; cleanup is definitely needed, thanks. > - please don't allow UAs to implement other encodings. You should list > the exact set of supported encodings and the exact labels that should > be recognised as meaning those encodings, and disallow all others. > Otherwise, we'll be in a never-ending game of reverse-engineering each > others' lists of supported encodings and it'll keep growing. > > - What's the use case for supporting anything but UTF-8? > For both of the above: initially suggested use cases included parsing data as esoteric as ID3 tags in MP3 files, where encoding unspecified and is guessed at by decoders, and includes non-Unicode encodings. It was suggested that the encoding sniffing capabilities of browsers be leveraged. (Cue a strong "nooooooo!" from Anne.) I completely agree that we should explicitly list the set of encoding supported and should remove the "other encodings" allowance. Whether we should restrict it as far as UTF-8 depends on whether we envision this API only used for parsing/serializing newly defined data formats, or whether there is consideration for interop with previously existing formats data formats and code. For example, "BINARY" would be used to bridge the existing atob()/btoa() methods with Typed Arrays (although base64 directly in/out of Typed Arrays would be preferable). Jonas, since you started this thread - did your content authors mention encodings? > - Having a mechanism that lets you encode the string and get a length > separate from the mechanism that lets you encode the string and get the > encoded string seems like it would encourage very inefficient code. Can > we instead have a mechanism that returns both at once? Or is the idea > that for some encodings getting the encoded length is much quicker than > getting the actual string? > The use case was to compute the size necessary to allocate a single buffer into which may be encoded multiple strings and other data, rather than allocating multiple small buffers and then copying strings into a larger buffer. Ignoring the issue of invalid code points, the length calculations for non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not be sanitized, that case is trivially 2x the JS string length.) > - Seems weird that integers and strings would have such different APIs > for doing the same thing. Why can't we handle them equivalently? As in: > > len = view.setString(strings[i], > offset + Uint32Array.BYTES_PER_ELEMENT, > "UTF-8"); > view.setUint32(offset, len); > offset += Uint32Array.BYTES_PER_ELEMENT + len; > Heh, that's where the discussion started, actually. We wanted to keep the DataView interface simple, and potentially support encoding into plain JS arrays and/or non-TypedArray support that appeared to be on the horizon for JS. > HTH, > -- > Ian Hickson U+1047E )\._.,--....,'``. fL > http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' >
Received on Tuesday, 13 March 2012 16:50:43 UTC