W3C home > Mailing lists > Public > www-talk@w3.org > November to December 2002

Re: URLs and double byte characters (unicode)

From: André-John Mas <ajmas@sympatico.ca>
Date: Tue, 24 Dec 2002 08:47:23 -0500
To: W3C WWW talk <www-talk@w3.org>
Message-Id: <3B18EBC5-1746-11D7-AC83-003065D6B164@sympatico.ca>

I wasn't clear in my explanation, because I didn't quite
understand the issue.

If the server specifies the URL already encoded using the '%'
style encoding, then the URL arrives unmodified:


On the other hand if I specify the URL with a character name
(is this accepted in the specs?), e.g.:


Then the URL will be encoded differently according to whether the
page is ISO-8859-1 or UTF-8 (would this be represensitive of a URL
typed into the address bar as well?):

ISO-8859-1: http://localhost/%E9
utf-8:      http://localhost/%C3%A9

Since the request header does not explicitly specify the page encoding,
unless I missed this, this makes it it difficult to know how to handle
the URL decoding.


On Monday, Dec 23, 2002, at 08:59 America/Montreal, Ian Hickson wrote:

> On Mon, 23 Dec 2002, Bjoern Hoehrmann wrote:
>> That's news to me. My Mozilla does the following. Typing the URI
>> http://localhost:99/björn into the address bar the browser requests
>>   GET /bj%F6rn HTTP/1.1
>> That's ISO-8859-1 or a compatible encoding.
> Oh, my bad. I assumed we were talking about form submissions.  It is
> possible I am mistaken even for those cases, though.
> For links, if they are invalid (i.e. not correctly escaped), I believe
> Mozilla will use the document encoding to form the URIs.
> As I said, though, there is no spec (to my knowledge) that defines 
> this.
> -- 
> Ian Hickson                                      )\._.,--....,'``.    
> fL
> "meow"                                          /,   _.. \   _\  ;`._ 
> ,.
> http://index.hixie.ch/                         
> `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 24 December 2002 08:47:31 UTC

This archive was generated by hypermail 2.4.0 : Monday, 20 January 2020 16:08:27 UTC