Date: Fri, 2 May 1997 18:19:43 +0200 (MET DST) From: "Martin J. Duerst" <email@example.com> To: Larry Masinter <firstname.lastname@example.org> cc: "Michael Kung <MKUNG.US.ORACLE.COM>" <MKUNG@us.oracle.com>, email@example.com Subject: Re: Using UTF-8 for non-ASCII Characters in URLs In-Reply-To: <3366C606.786A@parc.xerox.com> Message-ID: <Pine.SUN.3.96.970502180918.245k-100000@enoshima> On Tue, 29 Apr 1997, Larry Masinter wrote: > This isn't just a "small point", it's essential: > > The only way to guarantee "round trip" is to stick to the smallest > repertoire of characters. Yes. But it has to be qualified. It is the smallest set of characters that you think your target audience is safely able to distinguish and handle. > Clearly you shouldn't enter "http" as > wide characters, That goes without saying, or doesn't it? Or a browser could convert it to half-width characters (as a curtesy to the user, not as part of any spec). > and if you have 'wide characters' that need > to be distinguished from ascii characters, you should encode them > in hex-encoded-UTF8 always. I think we have to distinguish two cases: The case that the URL is just used as a carrier for transporting information from point to point (FORM/QUERY): In this case, both hex-encoded and 8-bit UTF-8 will work, as the binary world is never left (but we know there are other problems with querys, I am working towards a draft about them). The case that URLs are passed around, on paper and so: In this case, using %HH as a backup mechanism works, but it is no fun. As there may be target audiences that can very well (actually too well :-) distinguish between half-width and full-width variants (e.g. East Asian programmers), it may very well be possible to issue such URLs for such audiences. That's why for such cases, I don't specify eqivalence nor normalization, but I strongly discourage their use because they cannot be safely distinguished by a wider audience. Regards, Martin.