Re: How browsers display URIs with %-encoding (Opera/Firefox FAIL) from Martin J. Dürst on 2011-07-25 (public-iri@w3.org from July 2011)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 25 Jul 2011 15:42:24 +0900
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: Chris Weber <chris@lookout.net>, public-iri@w3.org
Message-ID: <4E2D1050.6030907@it.aoyama.ac.jp>

On 2011/07/22 5:15, Leif Halvard Silli wrote:
> Chris Weber, Thu, 21 Jul 2011 12:53:27 -0700:
>> On 7/21/2011 12:05 PM, Leif Halvard Silli wrote:
>>> The actual *problem* in Opera's treatment of Test 1 is not that it
>>> displays ~/Dürst but that, when you ctrl-click/right-click (or just
>>> click) the link in order to copy it (or follow it), then you get
>>> ~/D%FCrst instead of ~/Dürst.
>>
>> Why is that a problem?
>
> Because
> 1) it disappoints and confusees the user when upon activating the link,
> he/she doesn't get to the intended resource.

 From all we know, the indented resource is /People/D%FCrst.

> 2) it is also confusing that hover says "ü" while the link says
> something else.

Yes, that's what the test is about. It's just that you mistakenly think 
that "ü" is correct, when it's actually %FC.

> 3) it also means that typing "Dürst" in the URL bar will work better
> than clicking the link.

You are somehow assuming that /People/Dürst is the right thing, and 
/People/D%FCrst is the wrong thing. But that's just because you have 
looked at the actual pages. The test isn't about this. URIs are not 
about going to the page the writer/reader had in mind, but about going 
to the page that the URI says to go to. It just so happened that I had 
these two pages handy, but it should work for any other, similar case 
(only one of the pages might exist, or they might be totally unrelated).

> (And this is a good reason for, when hovering,
> *display* the URL as "Dürst" rather than percent encoded.)
>
>>   All browsers tested agree that the path for
>> this URI is "/People/D%FCrst" as literally typed and as evident by
>> observing the HTTP request.
>
> * When we observe what it "displays" to the user, then they don't
> agree.

Yes. That's why some of them have to be fixed.

>>   That seems to align with my
>> understanding of RFC3986 and 3986's treatment of the %FC which if
>> decoded would be illegal UTF-8.
>
> The page in question uses Windows-1252/ISO-8859-1. Question: Would it
> have made a difference if instead of using ISO-8859-1 based percent
> encoding,

%FC isn't ISO-8859-1 based %-encoding. %FC is just encoding a 0xFC byte 
for an URI. The URI doesn't know whether this is supposed to be 
ISO-8859-1, some EBCDIC variant, or whatever.

> Martin had typed the letter 'ü' directly?

Yes. In that case, we have an IRI, and UTF-8 should be used to create an 
URI for over-the-wire HTTP. That means we should have gotten to 
/People/D%C3%BCrst.

> Because, if, in a ISO-8859-1 encoded page, hef="D%FCrst" does not work
> as well as href="Dürst", then I think HTML5 validators in fact should
> warn against use of percent encoding that isn't UTF-8 based.

Probably no. Page authors should not replace hef="D%FCrst" with 
href="Dürst", because that may not exist, or may be a different page. In 
this sense, my example is misleading (because when there's a redirect 
from one page to another, then indeed it may be a good idea to replace 
one URI/IRI with another, but that's not what the test is about).

Regards,    Martin.

Received on Monday, 25 July 2011 06:43:48 UTC