Re: How browsers display IRI's with mixed encodings from Leif H Silli on 2011-07-27 (public-iri@w3.org from July 2011)

From: Leif H Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 27 Jul 2011 23:53:37 +0300
To: addison@lab126.com
Cc: duerst@it.aoyama.ac.jp, chris@lookout.net, public-iri@w3.org
Message-ID: <2510870861.996278017@xn--mlform-iua.no>

Phillips, Addison 27/7/'11,  4:13

> Making the literal sequence %FC invalid would be a Bad Thing.

Accepted

..snip...
 
>> But an author which -today- inserts %FC is likely to do a mistake - or at least
>> make a bad choice, no?
> 
> An author who inserts u-umlaut and expects to get %FC is making a mistake.

Yes.

> An author who inserts %FC and expects to see u-umlaut is making a mistake (or should be).

Depends on what you mean by 'expect'. But I guess we agree.

> But an author who inserts %FC because that's what her server expects? Valid.

I see the point.

>And an author who inserts u-umlaut and expects it to display as u-umlaut and send (as %C3%BC in URI form)? Also valid, IMHO.

Why did you add 'IMHO'? This should not only be a valid expectation but *the* expected behavior? Did not Martin's test show exactly that for the directly typed IRI?

Except a bug in Opera etc. Btw, I tested how some text browsers interprets a directly typed <a href="ü"> in a ISO-8859-1 encoded page. 

Results: all of them (W3M, Lynx, Links, eLinks, netrik) treated it as %FC (and not as %C3%BC) 

So a bad story for IRI links in legacy encodings there ... in contrast to the situation for GUI browsers.

..snip...

>> My focus is authors. And of course it could be the author meant %FC. But might
>> it not more often be simply a result of a bad %-encoder or on a misconception?
>> 
> 
> The problem, as I see it, is not with the sequence %FC. It is with the character U+00FC appearing in an HTML document inside a URI path. 
> 
> I tend to think that the interpretation of %FC using page encoding is bad because an IRI (or URI) lacks the necessary context to make that determination. I agree with Boris's earlier message on the list that showing %FC is a bad user experience. But shouldn't we be trying to close on a well-defined set of behaviors that content authors (and others) can understand?

+1 amen to dropping displaying %FC according to page encoding.

> I think such an approach would include the behavior described above, even at the expense of some usability. And who looks at those really long URIs full of percent gunk anyway? :-))

Agree.

But I snipped that you said that %FC should be in wide use. And if that is the case, then there could be a lot of legacy content out there which Firefox is motivated to give a fake character display for, no? 

But how commonly are -or where- e.g. %FC used to point to a "ü-resource"? Not often, I think. Non-ascii is avoided, even today.
--
Leif H Silli

Received on Wednesday, 27 July 2011 20:54:30 UTC