Re: Using UTF-8 for non-ASCII Characters in URLs
Martin J. Duerst (mduerst@ifi.unizh.ch)
Fri, 2 May 1997 18:19:43 +0200 (MET DST)
Date: Fri, 2 May 1997 18:19:43 +0200 (MET DST)
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Larry Masinter <masinter@parc.xerox.com>
cc: "Michael Kung <MKUNG.US.ORACLE.COM>" <MKUNG@us.oracle.com>, uri@bunyip.com
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
In-Reply-To: <3366C606.786A@parc.xerox.com>
Message-ID: <Pine.SUN.3.96.970502180918.245k-100000@enoshima>
On Tue, 29 Apr 1997, Larry Masinter wrote:
> This isn't just a "small point", it's essential:
>
> The only way to guarantee "round trip" is to stick to the smallest
> repertoire of characters.
Yes. But it has to be qualified. It is the smallest set of
characters that you think your target audience is safely
able to distinguish and handle.
> Clearly you shouldn't enter "http" as
> wide characters,
That goes without saying, or doesn't it? Or a browser could
convert it to half-width characters (as a curtesy to the user,
not as part of any spec).
> and if you have 'wide characters' that need
> to be distinguished from ascii characters, you should encode them
> in hex-encoded-UTF8 always.
I think we have to distinguish two cases:
The case that the URL is just used as a carrier for transporting
information from point to point (FORM/QUERY): In this case,
both hex-encoded and 8-bit UTF-8 will work, as the binary
world is never left (but we know there are other problems with
querys, I am working towards a draft about them).
The case that URLs are passed around, on paper and so: In this
case, using %HH as a backup mechanism works, but it is no fun.
As there may be target audiences that can very well (actually
too well :-) distinguish between half-width and full-width
variants (e.g. East Asian programmers), it may very well be
possible to issue such URLs for such audiences. That's why
for such cases, I don't specify eqivalence nor normalization,
but I strongly discourage their use because they cannot
be safely distinguished by a wider audience.
Regards, Martin.