Re: String to ArrayBuffer

On 1/11/2012 4:22 PM, Boris Zbarsky wrote:
> On 1/11/12 6:03 PM, Charles Pritchard wrote:
>> Is there any instance in practice where DOMString as exposed to the
>> scripting environment is not implemented as a unicode string?
>
> I don't know what you mean by that.
>
> The point is, it's trivial to construct JS strings that contain 
> arbitrary sequences of 16-bit units (using fromCharCode or \u 
> escapes).  Nothing anywhere in JS or the DOM per se enforces that 
> strings are valid UTF-16 (which is the way that an actual Unicode 
> string would be encoded as a JS string).


My [wrong] understanding was that DOMString referred to valid unicode.

WebIDL:
"The DOMString type corresponds to the set of all possible sequences of 
16 bit unsigned integer code units. Such sequences are commonly 
interpreted as UTF-16 encoded strings [RFC2781] although this is not 
required... Nothing in this specification requires a DOMString value to 
be a valid UTF-16 string."
http://www.w3.org/TR/WebIDL/#idl-DOMString

DOM3:
"The DOMString type is used to store [Unicode] characters as a sequence 
of 16-bit units using UTF-16 as defined in [Unicode] and Amendment 1 of 
[ISO/IEC 10646]." There are some normalization notes, but otherwise, 
it's close enough to saying it stores Unicode, but it can handle all 
16bit combinations.
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-C74D1578

For "historic reasons" WindowBase64 throws an error if input is not 
within Unicode range.
http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob


>> I realize that internally, DOMString may be implemented as a 16 bit
>> integer + length;
>
> Not just internally.  The JS spec and the DOM spec both explicitly say 
> that this is what strings are: an array of 16-bit integers.

WebIDL and DOM define "DOMString", of course. JS defines "The String 
Type" in 8.4. They are intended to be the same.
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

"The  String type is the set of all finite ordered sequences of zero or 
more 16-bit unsigned integer values .... When a String contains actual 
textual data, each element is considered to be a single UTF-16 code 
unit.  Whether or not this is the actual storage format of a String, the 
characters within a String are numbered by their initial code unit 
element position as though they were represented using UTF-16."

>> Browsers do the same thing with WindowBase64, though it's specified as
>> DOMString, in practice (as the notes say), it's unicode.
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob 
>>
>
> If you look at the actual processing model, you take the input array 
> of 16-bit integers, throw if any is not in the set { 0x2B, 0x2F, 0x30 
> } union [0x41,0x5A] union [0x61,0x6A] and then treat the rest as ASCII 
> data (which at that point it is).
>
> It defines this in terms of "Unicode" but that's just because any JS 
> string that satisfies the above constraints can be considered a 
> "Unicode" string if one wishes.
>
>> Web Storage, also, only works with unicode.
>
> I'm not familiar with the relevant part of Web Storage.  Can you cite 
> the relevant part please?

The character code conversion gets weird. If you'd explain this in the 
proper terms, I'd appreciate it.

Load a binary resource via the old charset hack.

Save the resulting string into localStorage. There are some conversion 
issues. I am not using the right vocabulary.
I know the list has seen the issue before, and I'll bet someone here can 
explain it succinctly.

Example:
// Image files are easiest to try this with.
https://developer.mozilla.org/En/XMLHttpRequest/Using_XMLHttpRequest#Receiving_binary_data_in_older_browsers
// From the article:
function load_binary_resource(url) {
   var req = new XMLHttpRequest();
   req.open('GET', url, false);
   //XHR binary charset opt by Marcus Granado 2006 
[http://mgran.blogspot.com]
   req.overrideMimeType('text\/plain; charset=x-user-defined');
   req.send(null);
   if (req.status != 200) return '';
   return req.responseText;
}
var x = load_binary_resource('imageurl.png');
localStorage.fail = x;
localStorage.fail == x.fail; // will return false.

Received on Thursday, 12 January 2012 03:51:41 UTC