- From: Yung-Fong Tang <ftang@netscape.com>
- Date: Tue, 19 Feb 2002 08:50:42 -0800
- To: www-international <www-international@w3.org>, Katsuhiko Momoi <momoi@netscape.com>, Bob Jung <bobj@netscape.com>
I wonder is there a w3c specification address the following issue: Background: All HTML could encoded with a charset, either by labeled by HTTP header or HTML meta tag. When the browser submit the form data to the server, for backward compatability reason, we should send the data in the url escaped form of the form charset. However, since it is possible to put any unicode data into the text feild, what should the browser do when the data it need to submit cannot be convert to the charset of the form html. I observed/heard about the following behavior: 1. prohibit the input, copy and paste of any characters which cannot be convert to the charset- Netscape 4.x did that. So there are no way to put Korean characters into ISO-8859-1 form. In this case, what you see is what you submit. 2. replace characters cannot be submit to '?' (N6.2 do that) 3. if there are ACCEPT_CHARSET specified in the HTML form , try to convert to different charset. (HTML 4.x say something about this). However, it will be very bad if one value is in one charset and the other is in a different one. 4. try to convert to UTF-8 if that happen. Same issue as above, we don't want to see one value in one charset and the other one in a different one. 5. convert it to the form charset, and for those character cannot be converted, conver it to NCR 〹 and then % escaped (the IE6 on my WinXP do that)
Received on Tuesday, 19 February 2002 11:51:30 UTC