UTF-8 and POST request

Hi,

I am in the process of converting my web site to be UTF-8 compliant,  
so that it can better handle scripts used by various languages. To be  
clear I am using a Java application server and by default it only  
accepts ISO-8859-1 for POST requests. In order to be able to have it  
treat the request parameters as UTF-8 I have to call the

   request.setCharacterEncoding("UTF-8")

each time I receive the parameters. I would have preferred an option  
to tell my application server to default to treating all post content  
as UTF-8, but this was rejected based on RFC 2616, section 3.7.1 and  
3.4.1.

I decided to try to specify the charset as part of the form's enctype  
attribute:

    <form action="" method="post" enctype="application/x-www-form- 
urlencoded; charset=utf-8">

though having tested with Safari, Firefox and Opera, I found that only  
Opera included the "charset=utf-8" component in the content-type of  
the request. Additionally if I specify

   <form action="" method="post" enctype="application/x-www-form- 
urlencoded; charset=utf-8" accept-charset="utf-8">

I get other strange results with Firefox and Safari. With Opera I just  
see questions marks when I pass my Japanese character test case.

Now to the questions:

  - should web browsers be acknowledging the charset attribute  
specified in the form, and sending them to the HTTP server?
  - is considered wrong to force my application to treat all requests  
as UTF-8?

André

Received on Tuesday, 7 October 2008 03:29:36 UTC