Re: Unicode in HTTP streams

> Some recent proposals suggest that to encode a character as Unicode, first
> convert to UTF-8 and then format each octet as %HH and send it out.  My
> experience with query strings, cookies, and form data is that user agents do
> not encode first in UTF-8 before formatting octets as %HH.  Rather I have
> found that the %HH format is context sensitive and is an agreement between
> the sender and the receiver.  Only when a page is specifically sent down to
> a user agent in UTF-8, will the user agent return data in the %HH format in
> UTF-8.  Since most html pages are still in character sets other than UTF-8,
> this means that the usage of the %HH format to mean UTF-8 is quite rare.
[...]
> Rather it seems to me that what is needed is an new HTTP encoding that
> explicitly indicates a Unicode codepoint analogous to the &#xHHHH; format
> that what invented for this very purpose for HTML.  In my investigations, I

Are you talking about the encoding of a URL on the method line
of an HTTP request, the encoding of a request body, or the encoding
of a response body? These aren't always the same thing in theory
or practice. It _sounds_ like you are talking about the encoding
of URLs.

--
    Albert Lunde          Albert-Lunde@northwestern.edu (new address)
                          Albert-Lunde@nwu.edu (old address)

Received on Tuesday, 15 May 2001 23:46:37 UTC