Re: How browsers display URIs with %-encoding (Opera/Firefox FAIL) from Boris Zbarsky on 2011-07-27 (public-iri@w3.org from July 2011)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 26 Jul 2011 21:41:09 -0400
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
CC: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <4E2F6CB5.1050609@mit.edu>
On 7/26/11 9:03 PM, "Martin J. Dürst" wrote:
>> 1) Leave all %-escapes as they are. This generally makes URIs on many
>> web pages out there (which by and large use %-escapes, not actual
>> Unicode chars) look like gibberish to users.
>>
>> 2) Unescape %-escapes. That gives you bytes and then you have to worry
>> about how to convert those bytes to chars, since users don't deal with
>> bytes.
>>
>> Option 1 leads to a _really_ bad user experience. So in practice you
>> want option 2. At this point you have several choices. You can only
>> convert "valid UTF-8 sequences" (whatever that means) to chars and
>> reescape all other bytes. This seems to be what you would prefer to
>> happen.
>
> Yes. This seems to be suboptimal locally. But it will lead to better
> interoperability and a better user experience globally.

It's not completely clear to me that it will.

In particular, whether it does or not will depend on several factors, 
including how often it is that people copy the link hover text down 
manually instead of using "copy link", how much user experience degrades 
every time someone does such a manual copy if the URI is unescaped and 
treated as ISO-8859-1 in this case, and how much user experience 
degrades for if the URI is not thus unescaped for users who hover the link.

Clearly the UX degradation is much worse in the "copy manually" case 
(you get a URI that leads to the wrong place).  But this case is also 
vanishingly rare.  I cannot think of a single time I've done that in the 
last 16 years or so, in fact...  and I'm much more likely to do it than 
most non-technical users are, I would bet.

> I agree with 'very likely'. But there're lot's of ifs. In essence, it's
> just giving the user a warm, fuzzy feeling locally, while in a bigger
> context, it's going to fail badly (and disappoint and confuse the user).

More precisely it's giving a warm fuzzy feeling locally for very common 
tasks while failing badly for a very rare task.

> Let's say the user looks at it and decides it looks good and puts it on
> a business card.

Every single business card creation process I have dealt with has been 
electronic, which means that the obvious way to get a URI into it is via 
copy and paste....

> And let's say a business partner tries to reach that
> page. It will fail. I guess nobody would call that a good user experience.

Indeed.  The question is whether this would happen in practice, and how 
often.

> The argument to not standardize UA aspects is usually a very good one,
> and I usually agree with it wholeheartedly. However, URIs and IRIs are
> about more than 'on-the-wire' interoperability, they are supposed to
> work over the phone and on the side of a bus.

I sympathize with this, but I suspect that the user experience tradeoffs 
here are such, at the present time, that a URI on a napkin breaking 
every so often is less painful than showing hundreds of millions of 
users "gibberish" URIs daily....

You are of course free to file a bug on Firefox here; I'm not a UI 
designer and it's not my call on what the status popup behavior is.  I 
wouldn't place high hopes on a change here, though, unless non-UTF8 
escapes have become a lot more rare recently.

> As for the address/location bar, I think the main reason there's no
> standard for it is that no such thing was needed up to now. At least as
> long as we were in an ASCII world, it just showed the URI it used to get
> the page. If there are cases where it didn't, I didn't notice, but you
> might know some cases.

Well, let's see.  Chrome's location bar drops "http://" from the front 
of some URIs.  Firefox's will likely do that and also drop trailing '/' 
after a hostname for http:// URIs that have a path of '/'.   Those are 
purely-ASCII munging of the URI.

> That's great. But when I wrote 'copy', I was also speaking about copying
> by hand (or via paper,...). I should have been more explicit about that.

In that case I understand the concern, but feel that this is a _very_ 
rare case.

> I can't copy that string with a copy command, but I can of course copy
> that down onto a napkin.

You _can_, but _will_ you?  I suspect that to a first approximation the 
answer is "no" for the vast majority of people.

> A quick check here indicates that it shows D%FCrst. That's the right
> thing to do, but it will confuse your user. Being consistent and always
> showing D%FCrst would be less confusing.

Having the mouseover feedback match the post-click location bar is the 
strongest argument for changing the former, actually.  I'm a little 
surprised they don't match.  Definitely worth filing a bug.  Let me know 
if you'd prefer I do that.

> People use software, and software gets used by people. The two have to
> work together to get the job done. If there's something that makes sense
> to the user but not to software, then something is wrong

Then something is wrong all the time....  There are lots of concepts 
that make sense to users but not software and vice versa.  The point of 
a user interface is to try to reduce the impedance mismatch as much as 
possible.

> IRIs were designed to make sense to the user and to make sense in
> software. The problem is that that's only possible if we nail down the
> encoding for the conversion (to UTF-8 as it happens), and therewith give
> up on converting for other encodings.

Agreed, and if everyone were using IRIs well we would not be having this 
conversation in the first place.  The problem is people aren't in 
practice....

-Boris
Received on Wednesday, 27 July 2011 01:41:39 UTC