Re: What is the charset of the form data?

Hi Alek --

The answer is no - there is no way of knowing in all cases. If the form is
POSTed using multipart/form-data then the content- headers of each part
can indicate the charset used for the part. I don't know of a browser that
do this, however (newer ones might - does anyone know?).  But, if the form
is application/www-form-urlencoded then there is not such mechanism -- the
encoding is hidden inside the URL text.

The 'updated' URI specification (RFC 2396 -
http://www.rfc-editor.org/rfc/rfc2396.txt) defines how that encoding
should take place for a given charset, and then _suggests_ that all URIs
should be encoded using the UTF-8 charset. However, there is no
requirement to do so. And, of course, older browsers (Navigator 3, IE
3....) don't know about this new specification.

Currently, IE5 encodes URLs (I believe according to RFC 2396) using the
charset defined in the content-type header for the received HTML document
(or by a META element specifying the content-type). I don't know about
IE4, nor Navigator 4.x, nor Opera/iCab... Navigator 6 either encodes the
URL using ISO-8859-1 (and puts encoded question marks for non-Latin-1
characters) or using UTF-8 (if this charset is specified in a content-tupe
header or meta element).

Hope this helps --

Ian


On Fri, 7 Jul 2000, Aleksandar Susnjar wrote:

> I am almost sure that this is covered by something but I could not find it
> defined by any standards...
> 
> Is there any STANDARD way (i.e. no tricks) to determine what is the
> character set of the form data sent (POST method) by the web browser to the
> web server? Of course, this also applies to CGI/xxxAPI/Apache mod
> development...
> 
> I guess that the browser should also send Content-Type with charset
> specified, but it seems that there is someting stopping it on its way...
> 
> I know that, by default, the form data is sent in the encoding of the form
> itself, but there are cases where the source form is not known ... along
> with its encoding...
> 
> Regards,
> 
> Aleksandar Susnjar
> Planet Intra
> 

Received on Monday, 10 July 2000 14:14:30 UTC