- From: Dan Oscarsson <Dan.Oscarsson@trab.se>
- Date: Wed, 30 Apr 1997 08:52:17 +0200 (MET DST)
- To: uri@bunyip.com, masinter@parc.xerox.com
> Since no one else has, here's a rough draft of a UTF-8 URL
> internet-draft, which I intend to submit in a few days time,
> after taking another pass on it.
>
>
> -----
> INTERNET-DRAFT Larry Masinter, Xerox Corporation
> draft-masinter-url-i18n-00xx April 27, 1997
> Expires: October 27, 1997
> 3.2 Requirements for URL generation and interpretation
>
> Systems that are offering resources through the internet
> where those resources have logical names sometimes offer
> the ability to generate URLs for the resources they offer.
> For example, some HTTP servers offer the ability to
> generate a 'directory listing' for file directories
> under their purvue, and then to respond to the generated
> URLs with the files. If the names of the files consist
> solely of US-ASCII characters, the transcription is
> simple, but other file systems offer a wider variety
> of characters. It is recommended that the generation
> of directories result in hex-encoded UTF-8 for non-USASCII
> characters in the listing, and that the interpretation
> of URLs accept both the raw UTF-8 or the hex-encoded version.
>
This is not right. A directory listing service generates a html document
that is sent back to the web browser. All URLs within a html document
should use the same character set as the document uses. That is,
if the document uses iso 8859-1, the URLs will be in iso 8859-1, and
if the document is in UTF-8, the URLs will be in UTF-8.
If the browser knows how to handle the character set of the html document,
it also should know how to translate the embedded URLs into UTF-8 when
the user follows a link.
In general, URLs used without a context that defines the characters used,
should be encoded using UTF-8. URLs used within a context where the
meaning of the characters is defined should use the character encoding
of the context.
Dan
Received on Wednesday, 30 April 1997 02:53:11 UTC