Re: How browsers display IRI's with mixed encodings

On 2011/07/27 5:50, Leif H Silli wrote:
> Martin J. Dürst 25/7/'11,  13:55:

>> A URI that contains %FC is perfectly valid (check RFC 3986). Because
>> it's a valid URI, it's also a valid IRI.
>
> But an author which -today- inserts %FC is likely to do a mistake - or
> at least make a bad choice, no?

If the author has no (or limited) control over the server (i.e. because 
the link points to another Web site), then the author has to use 
whatever will make that link work. If that other Web site uses %FC, then 
that's what has to go into the link. That's neither a mistake nor a bad 
choice by the author.

As for person in control of the server, in this day and age I would 
indeed call it a bad choice to put up a server that uses %FC. And indeed 
I'd guess that these days, new servers mostly get set up with file names 
in UTF-8 (both Apache and ISS do so on Windows, and most Linux distros 
use UTF-8 for the file system these days, frameworks such as Rails also 
do so).


> How common are such servers these days?
> My focus is authors. And of course it could be the author meant %FC. But
> might it not more often be simply a result of a bad %-encoder or on a
> misconception?

Some of these are still around. It's difficult to change old servers, 
and it's even more difficult to change the links on other sites that 
point to them.

<detour>
With mod_fileiri, you can have your cake and eat it too, if you get all 
the settings right. I.e. you can keep the file names locally the way you 
always have (e.g. D\xFCrst, I'm using 0xHH notation to express that 
these are real bytes), but accept D%C3%BCrst externally (i.e. pretend 
you're using UTF-8), and on top of that also accept D%FCrst but 
externally redirect in to  D%C3%BCrst (and then internally back to 
D%FCrst). But you have to be careful to get the settings right, so it 
may not be something the average server administrator wants to do.
</detour>


Regards,    Martin.

Received on Wednesday, 27 July 2011 10:41:06 UTC