Character encoding of HTTP requests and responses

From: Leila Schneberger <leila@windchill.com>
Date: Wed, 03 Dec 1997 10:06:32 -0600
Message-ID: <34858388.65A3306@windchill.com>
To: "www-international@w3.org" <www-international@w3.org>
We are building a Java HTTP gateway. We intend to support currently
available web servers (Netscape 3 and Microsoft IIS 3) and clients
(Netscape 4.0.3 and Explorer 4.0). Our gateway is going to provide file
upload/download capability. These files will be stored as Unicode. I
have some questions regarding what is currently supported for character
encoding. (Sometimes it is difficult to track the difference between the
direction the standards are going and current reality ;^)

When reading and writing local text files from Java, it is simple to use
Java character streams that perform conversions between internal Unicode
characters and the external native OS character encoding.  However, can
anyone tell me what character encoding conventions are used by Netscape
and Microsoft Web servers when passing HTTP request and response bodies
to servlets or CGI programs?

Specifically, some questions I have are:

1.Do HTTP clients (browsers) post request body content in their native
encoding and specify the content encoding in one of the requests
2.If so, what is the standard header and its format?
3.If a client posts a request body with a character encoding that is
different than the Web servers native encoding, is the body passed to
CGI or Servlet processing as-is?  That is, as an unaltered byte stream,
or does the Web server convert to its own native encoding?
4.If passed as-is, is the standard HTTP header that specifies the
encoding of the request's body made available to the processing CGI or
Servlet by the Web server?
5.If so, what CGI variable will it be in?
6.Do HTTP servers send response body content in their native encoding
and specify the content encoding in one of the response header?  If so,
what is the standard header and its format?  Or...
7.Do HTTP servers send response body content in a client (browser)
specified encoding?  If so, what is the request header that specifies
the possible response encodings?
8.If the HTTP server does conversion to send responses in the requested
encoding, what encoding is it expecting for CGI and Servlet produced
output?  Does it assume CGI and Servlet output is in the servers native
encoding and convert it on the fly?  Or..
9.Does the HTTP server expect the CGI or Servlet to read the request
headers themselves and produce a byte stream containing the correct
10.If so, what CGI variable will the acceptable encodings be specified
11.Is the CGI or Servlet that is generating the response responsible for
setting a content encoding header?
12.Are text content types the only ones that undergo conversion?  If
not, what are some other MIME types that may require additional
character encoding headers?

Any information or pointers to where I could get some of these questions
answered would be greatly appreciated.
Leila Schneberger
Received on Wednesday, 3 December 1997 11:06:59 UTC

