- From: Ian Graham <igraham@smaug.java.utoronto.ca>
- Date: Mon, 10 Jul 2000 14:14:26 -0400
- To: Aleksandar Susnjar <shule@planetintra.com>
- cc: www-html@w3.org
Hi Alek -- The answer is no - there is no way of knowing in all cases. If the form is POSTed using multipart/form-data then the content- headers of each part can indicate the charset used for the part. I don't know of a browser that do this, however (newer ones might - does anyone know?). But, if the form is application/www-form-urlencoded then there is not such mechanism -- the encoding is hidden inside the URL text. The 'updated' URI specification (RFC 2396 - http://www.rfc-editor.org/rfc/rfc2396.txt) defines how that encoding should take place for a given charset, and then _suggests_ that all URIs should be encoded using the UTF-8 charset. However, there is no requirement to do so. And, of course, older browsers (Navigator 3, IE 3....) don't know about this new specification. Currently, IE5 encodes URLs (I believe according to RFC 2396) using the charset defined in the content-type header for the received HTML document (or by a META element specifying the content-type). I don't know about IE4, nor Navigator 4.x, nor Opera/iCab... Navigator 6 either encodes the URL using ISO-8859-1 (and puts encoded question marks for non-Latin-1 characters) or using UTF-8 (if this charset is specified in a content-tupe header or meta element). Hope this helps -- Ian On Fri, 7 Jul 2000, Aleksandar Susnjar wrote: > I am almost sure that this is covered by something but I could not find it > defined by any standards... > > Is there any STANDARD way (i.e. no tricks) to determine what is the > character set of the form data sent (POST method) by the web browser to the > web server? Of course, this also applies to CGI/xxxAPI/Apache mod > development... > > I guess that the browser should also send Content-Type with charset > specified, but it seems that there is someting stopping it on its way... > > I know that, by default, the form data is sent in the encoding of the form > itself, but there are cases where the source form is not known ... along > with its encoding... > > Regards, > > Aleksandar Susnjar > Planet Intra >
Received on Monday, 10 July 2000 14:14:30 UTC