Re: Using UTF-8 for non-ASCII Characters in URLs

> > This is not right. A directory listing service generates a html document
> > that is sent back to the web browser. All URLs within a html document
> > should use the same character set as the document uses. That is, 
> > if the document uses iso 8859-1, the URLs will be in iso 8859-1, and
> > if the document is in UTF-8, the URLs will be in UTF-8.
> 
> Dan, for each item in a directory listing, there are two entries.
> 
> <A HREF="this-is-the-URL">this-is-what-the-user-sees</A>
> 
> The URL in the 'this-is-the-URL' part should use hex-encoded-UTF8,
> no matter what the user sees.
> 

If you use hex-encoding, yes. But NOT if you use the native character set
of the document. In that case, the 'this-is-the-URL' part must
use the same character set as the rest of the html document. Raw UTF-8
may only be used in a UTF-8 encoded html document, not in a iso 8859-1
encoded document.

A large amount of html documents are hand written in a text editor. A user
can not be expected to use a different encoding when typing the URLs
in a document.

But I agree that if hex-encoded characters are found in a URL they
should be UTF-8 otherwise it would be unclear what encoding is used
for hex-encoded URLs in a ascii-only html document. But a ascii-only
document may not contain any 8-bit characters in a URL as there is no
defined character set for them. 


To use native encoding in URLs in known context and hex-encoded UTF-8
in other places and, if you want, in known context is what I understand
others on the list also wants. If we cannot use native encoding when
typing in our URLs in our html documents very little is won.

    Dan

Received on Wednesday, 30 April 1997 04:46:03 UTC