Re: Cool IRIs & diacritics, for a change

Bjoern Hoehrmann, Sat, 05 Feb 2011 01:26:02 +0100:
> * Leif Halvard Silli wrote:
>> Questions and conclusions:
> 
>>  - is the article [1] simply outdated? Have new thing happened?
>>    or perhaps it doesn't speak about how to link to *filenames*?
> 
> If you used 'http' addresses in your tests

HTTP via Apache2 on MacOSX Snow Leopard.

> then the question is how the browser sends the address to the server

They all behaved the same way - whatever they did.

> and then it is up to the server
> how to interpret it, and the server may do one thing or another. 

Is there any way to regulate what Apache does?

> The server should obviously accept the NFC variant, but if it
> does not, then that is their business.

I believe the problem to be that Apache literally does not link the 
composed links that decomposed file names.

> For 'file' addresses it's up to browser and OS
> how they resolve them.

Yes. And that side works. 

Till the world becomes perfect, a Best Current Praxis for IRI work on a 
Mac is needed ... I would suggest this one:

 a) Repertoire: know which of your non-ASCII chars are affected;
 b) Then - you have these alternatives:
    1) avoid exactly those non-ASCIIs in all IRIs;
    2) don't use Cool IRIs (if they contain the affected letters)
    3) use decomposed values inside @href;

Option 3) allows you to create Cool IRIs that does not work when they 
are typed: clicking a link will work, but when you manually type the 
URL into the URL field of your browser, you will be using NFC, and so 
the link will fail. Thus, one should try to avoid option 3. Which means 
that the single most valuable advice becomes: avoid Cool IRIs (if they 
contain the affected letters).

In addition, one should 

 * Assume that HTML editing program inserts links using NFC;
 * Not try to convert file names to NFC ...;
 * Perhaps look for FTP programs that are able to convert the 
   file names to NFC during upload ...

>>  - why does Wikipedia work, then? I suppose the a *composed*
>>    'å', such as the when you type an 'å' in the URL bar, 
>>    is *ambiguous*: it can be interpreted two ways, perhaps.
>>    But wikipedia has probably hardcoded 'å' (%C3%A5) to mean
>>    'å'. OTOH, I don't understand why browsers considers '%C3%A5'
>>    ambiguous when the page is UTF-8 encoded ... ???
> 
> MediaWiki uses a Unicode Normalizer when available and does some things
> on its own to map some kinds of user input, it's likely that Wikipedia
> does normalize page names.

I suppose that this is one thing that is simpler with a database  
served Web page solution.
-- 
leif halvard silli

Received on Saturday, 5 February 2011 10:45:20 UTC