W3C home > Mailing lists > Public > whatwg@whatwg.org > September 2012

Re: [whatwg] URL: javascript URLs

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Fri, 28 Sep 2012 14:04:58 -0400
Message-ID: <5065E6CA.6050109@mit.edu>
To: Anne van Kesteren <annevk@annevk.nl>
Cc: whatwg@lists.whatwg.org
On 9/28/12 1:30 PM, Anne van Kesteren wrote:
> Well that is interesting. So the document encoding is not solely a
> query component affair?

At least not for Gecko, no.

> Does this only apply to javascript URLs? I
> cannot get this to work for data URLs.

Looks like there's some javascript:-specific code here, yes.  In 
particular, when the URI object is being created for javascript: and the 
document encoding is not UTF-8, it looks like Gecko will do the following:

1)  Take the given string (which by this point is a byte array,
     actually; if it started off as Unicode it got converted to UTF-8
     to produce this byte array).
2)  Unescape non-ascii escapes (that is, escapes whose hex value is not
     in the ASCII range).
3)  If the result is not valid UTF-8 bytes and the document encoding
     is some variant of utf-16, or is utf-7, or is
     x-imap4-modified-utf7 (whatever that is), just byte-inflate to
     Unicode.  There's a comment here about encodings that are not
     ASCII supersets.
4)  Otherwise, if the byte array looks like valid UTF-8, convert
     from UTF-8 to Unicode.
5)  Otherwise, convert to Unicode using the document
4)  Convert the resulting Unicode string to UTF-8.
5)  Escape non-ASCII bytes.

I have no idea how much of this is needed in practice...

Received on Friday, 28 September 2012 18:05:43 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:46 UTC