Facts about URL Internationalization

I think there are observable technical facts in this debate which we can
all agree on:

1) URLs are often distributed internationally in hardcopy form.  For
maximum global usability, such URLs must be restricted to the safe
characters of the US-ASCII character set.

2) Regardless of what the standard says, people do and will continue to
construct URLs containing unencoded octets above 0x7f.  (As evidence, look
at violations of 7-bit restrictions in RFC 822, SMTP, NNTP, etc).

3) URLs may have a character mapping for octets above 0x7f already
defined by context.  For example, a URL in a MIME part labelled
"text/plain; charset=iso-8859-1" will have a character mapping.

4) URLs may not have a character mapping for octets above 0x7f already
defined by context.

5) A character mapping for octets represented with the %HH notation is
currently undefined.

6) One key purpose of Internet Standards is to maximize global 
interoperability.

7) Were the URL standard to specify an interpretation for octet values
above 0x7f, it should be an international solution.


Taking all of these into account, I believe Martin Duerst's proposal is
on the right track.

Received on Friday, 21 February 1997 19:52:35 UTC