Re: How browsers display URIs with %-encoding (Opera/Firefox FAIL)

"Martin J. Dürst", Thu, 21 Jul 2011 20:09:25 +0900:

You say 'display'. But you seem have in mind at least 2 different 
things: 

A) how the link appears visually (and presumably audibly as well) when 
one hovers above it with the pointing device.
B) what gets copied if the use control-clicks (right-click, in the 
Windows world?) the link and copies it.

So, when you say 'FAIL' below, I assume that you meant that it failed 
both A) and B). If either A) or B) works, then you would not say 
'FAIL'.  (Because otherwise, I don't understand your interpretation of 
the results - see below.)

> [...] http://www.w3.org/People/D%FCrst. If 
> interpreted as the page encoding (iso-8859-1 in the test) this might 
> look like http://www.w3.org/People/Dürst, but this would not be 
> interoperable because if I copy http://www.w3.org/People/Dürst and 
> input it again in another browser, it will go somewhere else.

Hm. It should be perfectly interoperable to both *display* and 
*execute* and *copy* the link as ~/Dürst !

The important matter should be the encoding of the name of the Web 
resource located at ~/Dürst.

E.g. let us assume that the name of the resource is Unicode (UTF-8 or 
UTF-16) encoded. Then, whether the user copies ~/Dürst from a link in a 
page that is ISO-8859-1 encoded or from one a link in a page that is 
UTF-8 encoded, should not matter at all: Internally, in the browser, 
the letter 'ü' would  in either case be the same letter!

> On the other hand, in the second test, where the href attribute is 
> http://www.w3.org/People/D%C3%BCrst, it's okay to show 
> http://www.w3.org/People/Dürst, because that's using UTF-8, but it's 
> not okay to show http://www.w3.org/People/Dürst, because that would 
> lead to double encoding (i.e. an URI of 
> http://www.w3.org/People/D%C3%83%C2%BCrst) when used again somewhere 
> else.

It follows from what I said above that it should be OK, when hovering 
above the link, to "display" it as 'Dürst' in either case (that is: 
both Test 1 and Test 2, below).

Really, browsers should treat links that point to somewhere in the same 
page, different from links that points to an external page: For links 
to fragments on the same page, they should follow the 'use the page's 
internal encoding' approach. (So says HTML5, at least.) But when a link 
points to an external resource, then it would be better to assume that 
the resource has a Unicode encoded name and uses UTF-8 internally - 
thus, for external links, the browser should convert from the current's 
page encoding to UTF-8.

> Now for how the various browsers did in my test today:
> 
> Opera (11.50, Build 1074, Win7):
>   Test 1:   http://www.w3.org/People/Dürst (FAIL)
>   Test 2:   http://www.w3.org/People/Dürst (FAIL)

I don't agree that Opera fail any more than any other browser. In fact, 
if focus is 'display', then its treat ment of Test 1 is exemplarly.

For Test 1, when you hover above the link, then it displays ~/Dürst, 
which should be perfectly all right - this is (in fact) the most 
"napkin-compatible" display!

The actual *problem* in Opera's treatment of Test 1 is not that it 
displays ~/Dürst but that, when you ctrl-click/right-click (or just 
click) the link in order to copy it (or follow it), then you get 
~/D%FCrst instead of ~/Dürst. 

For Test 2, then Opera shows the opposite problem: the link gets copied 
and exectued correctly, but when you hover above it, then it renders 
meaninglessly - it doesn't even display the correct percent encoding.
 
> Firefox (5.0, Win7):
>   Test 1:   http://www.w3.org/People/Dürst (FAIL)
>   Test 2:   http://www.w3.org/People/Dürst (PASS)

The Test 1 behaviour is identical with that of Opera. Thus, the problem 
for Test 1 is not the 'hover display' but rather how it gets copied and 
executed.

For Test 2, then it displays ~/Dürst. But it *copies* ~/D%C3%BCrst, 
which is OK, but how napkin-compatible is it? Why not rather copy it as 
~/Dürst? That I don't really get.

> IE (8.0.7601.17514, Win7):
>   Test 1:   http://www.w3.org/People/D%FCrst (PASS)
>   Test 2:   http://www.w3.org/People/D%C3%BCrst (PASS*)

(No time to test right now.)

> Chrome (12.0.742.122, Win7):
>   Test 1:   http://www.w3.org/People/D%FCrst (PASS)
>   Test 2:   http://www.w3.org/People/Dürst (PASS)

In my book, Chrome fails Test 1 because it both renders and executes it 
as ~/D%FCrst, which is neither napkin-compatible nor Web-compatible.

> Safari (5.0.4 (7533.20.27)):
>   Test 1:   http://www.w3.org/People/D%FCrst (PASS)
>   Test 2:   http://www.w3.org/People/Dürst (PASS)

Same problem for Test 1 as in Chrome.

> Chrome and Safari do the right thing from an IRI perspective.

I would like to know why you think so.

> IE is 
> okay, but from an IRI perspective, it might try harder for Test 2. 
> Firefox gets it half wrong, and Opera gets it fully wrong.
> 
> This is a test where arguing about deployed base isn't as important 
> as thinking about first principles (IRIs get escaped/unescaped using 
> UTF-8), because we need interoperability via copy-paste and via 
> write-down-to-napking-and-input-back-into-address-bar.

Ah, above I only mentioned ctr-click/right-click. Is it your goal that 
~/D%C3%BCrst should be copied as ~/Dürst ? Chrome does hte opposite 
thing: If you try to open <file:///Dürst>, and the copies URL from the 
URL bar again, then it gets copied as <file:///D%C3%BCrst>. Chrome also 
shows as page saying something like "Don't find the address 
file:///D%C3%BCrst" - and Safari does the same. Opera and Firefox are 
more sensible, they say "Doesn't find the address file:///Dürst".

> For the failed 
> tests, Opera and Firefox fail on their own terms (keyboarding in the 
> address as it was displayed leads to different page than the original 
> link).
> 
> I haven't yet figured out how this kind of test could be automated, 
> but maybe somebody has an idea. If there is some javascript 
> functionality that makes sure the status bar is activated and then 
> can access it's content, that should do the job.

It is necessary with separat tests for
* display/rendering of fragment links 
* display/rendering of external links
* display of href=D&#xfc;rst vs href=Dürst

It is necesary to test for
* copy/paste of to/from URL bar
* how error pages/messages are displayed
* how ctrl-click (right-click) copy works
* how execution works, in coparison

For external links tests, then it is necessary to state whether one 
links to an resource whose file name
* is Unicode encoded
* follows NFC normalization

I also think that one should have tests, for both internal and external 
links, of how links which follow NFD normalizaiton is handled as well 
as how resources whose file name is NFD normalized is handled. 

Idea: It might make sense to display characters that are not 
NFC-normalizd as percent encoded. That way authors/users get a way to 
check whether htey have in faced used a valid, napkin-compatible etc, 
NFC normalized link or not.

For testing of page internal (that is #fragment links), you could 
create an ISO-8859-1 encoded page which contains links to directly 
typed fragments whose first letter begins with a non-ASCII letter from 
the ISO-8859-1 charset. And then you can test how that same page works 
if served/interpreted as another legacy, 8-bit encoding, such s KOI8-R 
etc. This test should compare wheter, for instance, in a ISO-8859-1 
page,  href="#Dürst" would hit both id="Dürst" and id="D&#xfc;rst".
-- 
Leif H Silli

Received on Thursday, 21 July 2011 19:06:03 UTC