- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Wed, 11 Jan 2012 19:22:56 -0500
- To: public-webapps@w3.org
On 1/11/12 6:03 PM, Charles Pritchard wrote:
> Is there any instance in practice where DOMString as exposed to the
> scripting environment is not implemented as a unicode string?
I don't know what you mean by that.
The point is, it's trivial to construct JS strings that contain
arbitrary sequences of 16-bit units (using fromCharCode or \u escapes).
Nothing anywhere in JS or the DOM per se enforces that strings are
valid UTF-16 (which is the way that an actual Unicode string would be
encoded as a JS string).
> I realize that internally, DOMString may be implemented as a 16 bit
> integer + length;
Not just internally. The JS spec and the DOM spec both explicitly say
that this is what strings are: an array of 16-bit integers.
> Browsers do the same thing with WindowBase64, though it's specified as
> DOMString, in practice (as the notes say), it's unicode.
> http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob
If you look at the actual processing model, you take the input array of
16-bit integers, throw if any is not in the set { 0x2B, 0x2F, 0x30 }
union [0x41,0x5A] union [0x61,0x6A] and then treat the rest as ASCII
data (which at that point it is).
It defines this in terms of "Unicode" but that's just because any JS
string that satisfies the above constraints can be considered a
"Unicode" string if one wishes.
> Web Storage, also, only works with unicode.
I'm not familiar with the relevant part of Web Storage. Can you cite
the relevant part please?
-Boris
Received on Thursday, 12 January 2012 00:23:29 UTC