Re: Using UTF-8 for non-ASCII Characters in URLs

Dan Oscarsson (Dan.Oscarsson@trab.se)
Wed, 30 Apr 1997 10:45:20 +0200 (MET DST)


Date: Wed, 30 Apr 1997 10:45:20 +0200 (MET DST)
From: Dan Oscarsson <Dan.Oscarsson@trab.se>
Message-Id: <199704300845.KAA10131@valinor.malmo.trab.se>
To: masinter@parc.xerox.com
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
Cc: uri@bunyip.com

> > This is not right. A directory listing service generates a html document
> > that is sent back to the web browser. All URLs within a html document
> > should use the same character set as the document uses. That is, 
> > if the document uses iso 8859-1, the URLs will be in iso 8859-1, and
> > if the document is in UTF-8, the URLs will be in UTF-8.
> 
> Dan, for each item in a directory listing, there are two entries.
> 
> <A HREF="this-is-the-URL">this-is-what-the-user-sees</A>
> 
> The URL in the 'this-is-the-URL' part should use hex-encoded-UTF8,
> no matter what the user sees.
> 

If you use hex-encoding, yes. But NOT if you use the native character set
of the document. In that case, the 'this-is-the-URL' part must
use the same character set as the rest of the html document. Raw UTF-8
may only be used in a UTF-8 encoded html document, not in a iso 8859-1
encoded document.

A large amount of html documents are hand written in a text editor. A user
can not be expected to use a different encoding when typing the URLs
in a document.

But I agree that if hex-encoded characters are found in a URL they
should be UTF-8 otherwise it would be unclear what encoding is used
for hex-encoded URLs in a ascii-only html document. But a ascii-only
document may not contain any 8-bit characters in a URL as there is no
defined character set for them. 


To use native encoding in URLs in known context and hex-encoded UTF-8
in other places and, if you want, in known context is what I understand
others on the list also wants. If we cannot use native encoding when
typing in our URLs in our html documents very little is won.

    Dan