- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 14 Mar 2012 00:01:42 +0000 (UTC)
On Tue, 13 Mar 2012, Joshua Bell wrote: > > For both of the above: initially suggested use cases included parsing > data as esoteric as ID3 tags in MP3 files, where encoding unspecified > and is guessed at by decoders, and includes non-Unicode encodings. It > was suggested that the encoding sniffing capabilities of browsers be > leveraged. [...] > > Whether we should restrict it as far as UTF-8 depends on whether we > envision this API only used for parsing/serializing newly defined data > formats, or whether there is consideration for interop with previously > existing formats data formats and code. Seems reasonable. If we have specific use cases for non-UTF-8 encodings, I agree we should support them; if that's the case, we should survey those use cases to work out what the set of encodings we need is, and add just those. > > - Having a mechanism that lets you encode the string and get a length > > separate from the mechanism that lets you encode the string and get the > > encoded string seems like it would encourage very inefficient code. Can > > we instead have a mechanism that returns both at once? Or is the idea > > that for some encodings getting the encoded length is much quicker than > > getting the actual string? > > > > The use case was to compute the size necessary to allocate a single buffer > into which may be encoded multiple strings and other data, rather than > allocating multiple small buffers and then copying strings into a larger > buffer. > > Ignoring the issue of invalid code points, the length calculations for > non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not > be sanitized, that case is trivially 2x the JS string length.) Yeah, but surely we'll mainly be doing stuff with UTF-8... One option is to return an opaque object of the form: interface EncodedString { readonly attributes unsigned long length; // internally has a copy of the encoded string } ...and then have view.setString take this EncodedString object. At least then you get it down to an extraneous copy, rather than an extraneous encode. Still not ideal though. > > - Seems weird that integers and strings would have such different APIs > > for doing the same thing. Why can't we handle them equivalently? As in: > > > > len = view.setString(strings[i], > > offset + Uint32Array.BYTES_PER_ELEMENT, > > "UTF-8"); > > view.setUint32(offset, len); > > offset += Uint32Array.BYTES_PER_ELEMENT + len; > > Heh, that's where the discussion started, actually. We wanted to keep > the DataView interface simple, and potentially support encoding into > plain JS arrays and/or non-TypedArray support that appeared to be on the > horizon for JS. I see where you're coming from, but I think we should look at the platform as a whole, not just one API. It doesn't help the platform as a whole if we just have the same features split across two interfaces, the complexity is even slightly higher than just having one consistent API that does ints and strings equivalently. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 13 March 2012 17:01:42 UTC