How browsers display URIs with %-encoding (Opera/Firefox FAIL)

When browsing for something else, I recently ended up at
http://www.w3.org/2001/08/iri-test/LinkShow.html, a test I had made up 
just about 10 years ago.

It checks how browsers display URIs with %-encoding in them. This 
display happens e.g. in a status bar (e.g. Safari) or in a popup (e.g. 
Opera when the status bar is not active).

The idea is that because %-encoding in URIs has to be interpreted as 
UTF-8 when converting to IRIs, in the first test, which has 
http://www.w3.org/People/D%FCrst in the href attribute, the only way to 
display this is as http://www.w3.org/People/D%FCrst. If interpreted as 
the page encoding (iso-8859-1 in the test) this might look like 
http://www.w3.org/People/Dürst, but this would not be interoperable 
because if I copy http://www.w3.org/People/Dürst and input it again in 
another browser, it will go somewhere else.

On the other hand, in the second test, where the href attribute is 
http://www.w3.org/People/D%C3%BCrst, it's okay to show 
http://www.w3.org/People/Dürst, because that's using UTF-8, but it's not 
okay to show http://www.w3.org/People/Dürst, because that would lead to 
double encoding (i.e. an URI of 
http://www.w3.org/People/D%C3%83%C2%BCrst) when used again somewhere else.

[Because of the shameless self-reference and some behind-the-scenes 
trickery for backwards compatibility, the actual pages referenced are:
for http://www.w3.org/People/D%C3%BCrst:
   my (now historical) people page at W3C
for http://www.w3.org/People/D%FCrst:
   same as above, but via a redirect with some explanations about IRIs
for http://www.w3.org/People/D%C3%83%C2%BCrst:
   the W3C 404 page, because the actual page doesn't exist
Please keep the distinction between the first two in mind when trying 
things out.]

Now for how the various browsers did in my test today:

Opera (11.50, Build 1074, Win7):
   Test 1:   http://www.w3.org/People/Dürst (FAIL)
   Test 2:   http://www.w3.org/People/Dürst (FAIL)

Firefox (5.0, Win7):
   Test 1:   http://www.w3.org/People/Dürst (FAIL)
   Test 2:   http://www.w3.org/People/Dürst (PASS)

IE (8.0.7601.17514, Win7):
   Test 1:   http://www.w3.org/People/D%FCrst (PASS)
   Test 2:   http://www.w3.org/People/D%C3%BCrst (PASS*)

Chrome (12.0.742.122, Win7):
   Test 1:   http://www.w3.org/People/D%FCrst (PASS)
   Test 2:   http://www.w3.org/People/Dürst (PASS)


Safari (5.0.4 (7533.20.27)):
   Test 1:   http://www.w3.org/People/D%FCrst (PASS)
   Test 2:   http://www.w3.org/People/Dürst (PASS)

Chrome and Safari do the right thing from an IRI perspective. IE is 
okay, but from an IRI perspective, it might try harder for Test 2. 
Firefox gets it half wrong, and Opera gets it fully wrong.

This is a test where arguing about deployed base isn't as important as 
thinking about first principles (IRIs get escaped/unescaped using 
UTF-8), because we need interoperability via copy-paste and via 
write-down-to-napking-and-input-back-into-address-bar. For the failed 
tests, Opera and Firefox fail on their own terms (keyboarding in the 
address as it was displayed leads to different page than the original link).

I haven't yet figured out how this kind of test could be automated, but 
maybe somebody has an idea. If there is some javascript functionality 
that makes sure the status bar is activated and then can access it's 
content, that should do the job.

Regards,   Martin.

Received on Thursday, 21 July 2011 11:10:49 UTC