- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Fri, 28 Sep 2012 14:04:58 -0400
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: whatwg@lists.whatwg.org
On 9/28/12 1:30 PM, Anne van Kesteren wrote: > Well that is interesting. So the document encoding is not solely a > query component affair? At least not for Gecko, no. > Does this only apply to javascript URLs? I > cannot get this to work for data URLs. Looks like there's some javascript:-specific code here, yes. In particular, when the URI object is being created for javascript: and the document encoding is not UTF-8, it looks like Gecko will do the following: 1) Take the given string (which by this point is a byte array, actually; if it started off as Unicode it got converted to UTF-8 to produce this byte array). 2) Unescape non-ascii escapes (that is, escapes whose hex value is not in the ASCII range). 3) If the result is not valid UTF-8 bytes and the document encoding is some variant of utf-16, or is utf-7, or is x-imap4-modified-utf7 (whatever that is), just byte-inflate to Unicode. There's a comment here about encodings that are not ASCII supersets. 4) Otherwise, if the byte array looks like valid UTF-8, convert from UTF-8 to Unicode. 5) Otherwise, convert to Unicode using the document encoding. 4) Convert the resulting Unicode string to UTF-8. 5) Escape non-ASCII bytes. I have no idea how much of this is needed in practice... -Boris
Received on Friday, 28 September 2012 18:05:43 UTC