- From: Charles Pritchard <chuck@jumis.com>
- Date: Wed, 11 Jan 2012 19:44:29 -0800
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- CC: public-webapps@w3.org
On 1/11/2012 4:22 PM, Boris Zbarsky wrote:
> On 1/11/12 6:03 PM, Charles Pritchard wrote:
>> Is there any instance in practice where DOMString as exposed to the
>> scripting environment is not implemented as a unicode string?
>
> I don't know what you mean by that.
>
> The point is, it's trivial to construct JS strings that contain
> arbitrary sequences of 16-bit units (using fromCharCode or \u
> escapes). Nothing anywhere in JS or the DOM per se enforces that
> strings are valid UTF-16 (which is the way that an actual Unicode
> string would be encoded as a JS string).
My [wrong] understanding was that DOMString referred to valid unicode.
WebIDL:
"The DOMString type corresponds to the set of all possible sequences of
16 bit unsigned integer code units. Such sequences are commonly
interpreted as UTF-16 encoded strings [RFC2781] although this is not
required... Nothing in this specification requires a DOMString value to
be a valid UTF-16 string."
http://www.w3.org/TR/WebIDL/#idl-DOMString
DOM3:
"The DOMString type is used to store [Unicode] characters as a sequence
of 16-bit units using UTF-16 as defined in [Unicode] and Amendment 1 of
[ISO/IEC 10646]." There are some normalization notes, but otherwise,
it's close enough to saying it stores Unicode, but it can handle all
16bit combinations.
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-C74D1578
For "historic reasons" WindowBase64 throws an error if input is not
within Unicode range.
http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob
>> I realize that internally, DOMString may be implemented as a 16 bit
>> integer + length;
>
> Not just internally. The JS spec and the DOM spec both explicitly say
> that this is what strings are: an array of 16-bit integers.
WebIDL and DOM define "DOMString", of course. JS defines "The String
Type" in 8.4. They are intended to be the same.
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
"The String type is the set of all finite ordered sequences of zero or
more 16-bit unsigned integer values .... When a String contains actual
textual data, each element is considered to be a single UTF-16 code
unit. Whether or not this is the actual storage format of a String, the
characters within a String are numbered by their initial code unit
element position as though they were represented using UTF-16."
>> Browsers do the same thing with WindowBase64, though it's specified as
>> DOMString, in practice (as the notes say), it's unicode.
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob
>>
>
> If you look at the actual processing model, you take the input array
> of 16-bit integers, throw if any is not in the set { 0x2B, 0x2F, 0x30
> } union [0x41,0x5A] union [0x61,0x6A] and then treat the rest as ASCII
> data (which at that point it is).
>
> It defines this in terms of "Unicode" but that's just because any JS
> string that satisfies the above constraints can be considered a
> "Unicode" string if one wishes.
>
>> Web Storage, also, only works with unicode.
>
> I'm not familiar with the relevant part of Web Storage. Can you cite
> the relevant part please?
The character code conversion gets weird. If you'd explain this in the
proper terms, I'd appreciate it.
Load a binary resource via the old charset hack.
Save the resulting string into localStorage. There are some conversion
issues. I am not using the right vocabulary.
I know the list has seen the issue before, and I'll bet someone here can
explain it succinctly.
Example:
// Image files are easiest to try this with.
https://developer.mozilla.org/En/XMLHttpRequest/Using_XMLHttpRequest#Receiving_binary_data_in_older_browsers
// From the article:
function load_binary_resource(url) {
var req = new XMLHttpRequest();
req.open('GET', url, false);
//XHR binary charset opt by Marcus Granado 2006
[http://mgran.blogspot.com]
req.overrideMimeType('text\/plain; charset=x-user-defined');
req.send(null);
if (req.status != 200) return '';
return req.responseText;
}
var x = load_binary_resource('imageurl.png');
localStorage.fail = x;
localStorage.fail == x.fail; // will return false.
Received on Thursday, 12 January 2012 03:51:41 UTC