Re: Binary data (ByteArray/ByteVector) proposal on public-script-coord

Added public-script-coord since discussion is happening here.

On Nov 5, 2009, at 3:08 PM, Alex Russell wrote:

> On Nov 5, 2009, at 2:48 PM, Maciej Stachowiak wrote:
>
>>
>> I pulled together a rough proposal for representing binary data in  
>> ECMAScript and posted it on public-script-coord. I think having  
>> this is important for many W3C specs, but it is probably best  
>> defined in ECMAScript. I'm posting a link here in case anyone is  
>> interested and is not on the public-script-coord mailing list yet:
>>
>> http://lists.w3.org/Archives/Public/public-script-coord/2009OctDec/0093.html
>
> Looks promising! A couple of thoughts:
>
>  * the middle-ground approach seems interesting, although having  
> them not be "real" arrays feels like we're just kicking the can down  
> the road WRT the large-ish number of things that could be thought of  
> as arrays but which don't act like them (NodeList, arguments, etc.).

I understand the concern. Indeed, for things like NodeList or  
HTMLCollection or arguments, it's often very desirable

My claim is that Data is not much like these things. I believe it is  
more like String. It happens to be a sequence (of a very specific  
type), but it's specialized enough to be worth treating differently.  
Do people often regret that String is not an Array? My impression is  
that this is not a common concern. That's why I imagined this design  
point.

But imagine we decided to go the other way and try to make these  
things arrays:

(a) I believe DataBuilder could be made an Array without introducing  
serious problems.
(b) I think Data could be made an array, but all the mutating methods  
of Array (which is a great deal of them) will always fail, so that  
seems like poor API design. I'd prefer to have a design where the  
immutable object lacks mutating methods entirely, rather than having  
mutating methods that always fail. That being said, just the read-only  
methods from the Array prototype could be provided.
(c) Array methods that return a new Array may be poor fits for Data/ 
DataBuilder - perhaps they could return a Data or DataBuilder instead  
if they are provided.

>  * any thoughts on type conversions? what does this do/return?:
>
>     var bits = new Data("...");
>     var res = bits += "?";
>
>     will strings have a toData() protocol? Should other objects be  
> able to implement such a protocol? will there be a canonical byte  
> format for all strings in the language?

Converting a String to a Data presumably involves charset encoding/ 
decoding. I have not made a proposal for that in my initial strawman.  
I do think charset transcoding is an extremely useful feature for many  
use cases though, especially ability to encode/decode UTF-8 and  
WinLatin1. Since you need a choice of charset encoding to meaningfully  
convert between binary data and strings, I think it's better not to  
make it implicit, but rather have explicit named methods that can take  
the encoding as a parameter. At least, that's my tentative thinking.  
Another possibility is to assume that in cases where you don't specify  
an encoding, strings are converted to/from UTF-16.

>   * given that Data are array-like things that have the property of  
> being packed (like arguments), maybe we're just missing a  
> PackedArray superclass in general that could help w/ the efficiency  
> concerns (irrespective of mutability).

I'm not sure what you mean by being packed or the similarity to  
arguments. Arguments contains arbitrary values, Data contains only  
unsigned integers in the range 0-255. Data is immutable. And with  
Data, it may not often be desirable

>   * what do you think about a toArray() method?

That can certainly be done. I am somewhat wary, because I think the  
Array version will often be much less efficient in speed and memory,  
and I believe it will rarely actually be useful. Imagine getting a raw  
binary JPEG over the wire. It's extremely unlikely you'd want to  
convert this to an Array, or call methods like filter() or map() on it.

>   * do you envision any provision for multi-dimensional Data  
> objects? E.g., <canvas> data.

The proposal here is that Data just holds raw binary data, without  
imposing structure. If you want to use it to hold image data or a  
frame buffer, then by my proposal, you have to do the indexing math  
yourself.

Regards,
Maciej

Received on Friday, 6 November 2009 00:43:23 UTC