- From: Charles Pritchard <chuck@jumis.com>
- Date: Wed, 11 Jan 2012 19:44:29 -0800
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- CC: public-webapps@w3.org
On 1/11/2012 4:22 PM, Boris Zbarsky wrote: > On 1/11/12 6:03 PM, Charles Pritchard wrote: >> Is there any instance in practice where DOMString as exposed to the >> scripting environment is not implemented as a unicode string? > > I don't know what you mean by that. > > The point is, it's trivial to construct JS strings that contain > arbitrary sequences of 16-bit units (using fromCharCode or \u > escapes). Nothing anywhere in JS or the DOM per se enforces that > strings are valid UTF-16 (which is the way that an actual Unicode > string would be encoded as a JS string). My [wrong] understanding was that DOMString referred to valid unicode. WebIDL: "The DOMString type corresponds to the set of all possible sequences of 16 bit unsigned integer code units. Such sequences are commonly interpreted as UTF-16 encoded strings [RFC2781] although this is not required... Nothing in this specification requires a DOMString value to be a valid UTF-16 string." http://www.w3.org/TR/WebIDL/#idl-DOMString DOM3: "The DOMString type is used to store [Unicode] characters as a sequence of 16-bit units using UTF-16 as defined in [Unicode] and Amendment 1 of [ISO/IEC 10646]." There are some normalization notes, but otherwise, it's close enough to saying it stores Unicode, but it can handle all 16bit combinations. http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-C74D1578 For "historic reasons" WindowBase64 throws an error if input is not within Unicode range. http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob >> I realize that internally, DOMString may be implemented as a 16 bit >> integer + length; > > Not just internally. The JS spec and the DOM spec both explicitly say > that this is what strings are: an array of 16-bit integers. WebIDL and DOM define "DOMString", of course. JS defines "The String Type" in 8.4. They are intended to be the same. http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf "The String type is the set of all finite ordered sequences of zero or more 16-bit unsigned integer values .... When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. Whether or not this is the actual storage format of a String, the characters within a String are numbered by their initial code unit element position as though they were represented using UTF-16." >> Browsers do the same thing with WindowBase64, though it's specified as >> DOMString, in practice (as the notes say), it's unicode. >> http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob >> > > If you look at the actual processing model, you take the input array > of 16-bit integers, throw if any is not in the set { 0x2B, 0x2F, 0x30 > } union [0x41,0x5A] union [0x61,0x6A] and then treat the rest as ASCII > data (which at that point it is). > > It defines this in terms of "Unicode" but that's just because any JS > string that satisfies the above constraints can be considered a > "Unicode" string if one wishes. > >> Web Storage, also, only works with unicode. > > I'm not familiar with the relevant part of Web Storage. Can you cite > the relevant part please? The character code conversion gets weird. If you'd explain this in the proper terms, I'd appreciate it. Load a binary resource via the old charset hack. Save the resulting string into localStorage. There are some conversion issues. I am not using the right vocabulary. I know the list has seen the issue before, and I'll bet someone here can explain it succinctly. Example: // Image files are easiest to try this with. https://developer.mozilla.org/En/XMLHttpRequest/Using_XMLHttpRequest#Receiving_binary_data_in_older_browsers // From the article: function load_binary_resource(url) { var req = new XMLHttpRequest(); req.open('GET', url, false); //XHR binary charset opt by Marcus Granado 2006 [http://mgran.blogspot.com] req.overrideMimeType('text\/plain; charset=x-user-defined'); req.send(null); if (req.status != 200) return ''; return req.responseText; } var x = load_binary_resource('imageurl.png'); localStorage.fail = x; localStorage.fail == x.fail; // will return false.
Received on Thursday, 12 January 2012 03:51:41 UTC