W3C home > Mailing lists > Public > public-html@w3.org > June 2008

Re: expected results for URI encoding tests?

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Fri, 27 Jun 2008 17:06:07 +0100
Message-ID: <48650FEF.3080907@cam.ac.uk>
To: Julian Reschke <julian.reschke@gmx.de>
CC: "public-html@w3.org WG" <public-html@w3.org>

Julian Reschke wrote:
> We really should try to define a way that yields UTF-8 based encoding 
> independently of the document's encoding.

We also really shouldn't break existing sites that work perfectly well 
in current web browsers. E.g. http://www.yildizburo.com.tr/ says

   <a href="urunlist.php?tur=FAX MAKİNALARI&kategori=Laser Fax" 
class="textmenu">Laser Fax</a>

encoded in Windows-1254. Clicking that link, Firefox/Opera/Safari go to


while IE goes to


where the İ is a raw 0xDD byte. Both variations load the correct page.

Using UTF-8, i.e.


returns a page with no data, which is bad.

Looking at random pages listed in dmoz.org (which seems quite biased 
towards English sites), something like 0.5% have non-ASCII characters in 
<a href> query strings, and (judging by eye) maybe half of those are not 
UTF-8, so it's a widespread issue and there's no chance of fixing all 
those sites.

That imposes some constraints on any proposed solution, and means 
"queries are always converted to percent-encoded UTF-8" is inadequate. 
It seems there's still some flexibility (e.g. IE converting unmappable 
characters to "?", vs FF converting unmappable strings to UTF-8), though 
I have no idea how nice a solution is possible within the limits.

Philip Taylor
Received on Friday, 27 June 2008 16:06:49 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:35 UTC