Re: html, http, urls and internationalisation

I don't understand something about coping with URLs printed in newspapers,
business cards, etc.  In Unicode, there are multiple ways to code a given
character.  For example, Unicode includes Latin-1, which includes O-umlaut.
Unicode also has an umlaut modifier, so that the same character can be coded as
the two-code sequence "umlaut, O".  Do people who enter URLs have to be careful
to do so in a certain canonical way?  Does a server have to canonicalize URLs
it receives?  What about the other parts of a URL (e.g., FQDN --- does the DNS
have to canonicalize lookups)?  What about characters that appear similar
enough that the printing quality --- and the expertise of the reader --- might
not be enough to make the distinction?  What about distinctions --- such as
that between the Greek letter pi and the math symbol pi --- that are not
manifest in a printed glyph?

Received on Wednesday, 31 January 1996 15:13:12 UTC