- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 20 Feb 2002 08:47:14 +0900
- To: ftang@netscape.com (Yung-Fong Tang), www-international <www-international@w3.org>, Katsuhiko Momoi <momoi@netscape.com>, Bob Jung <bobj@netscape.com>
Hello Frank, At 08:50 02/02/19 -0800, Yung-Fong Tang wrote: >I wonder is there a w3c specification address the following issue: In summary, no, but XForms should provide it. Please review the XForms WD, at http://www.w3.org/TR/xforms/, in last call until the end of this week. >Background: >All HTML could encoded with a charset, either by labeled by HTTP header or >HTML meta tag. When the browser submit the form data to the server, for >backward compatability reason, we should send the data in the url escaped >form of the form charset. Yes, this is what the spec says, and what (reasonably newer) browsers do. >However, since it is possible to put any unicode data into the text feild, >what should the browser do when the data it need to submit cannot be >convert to the charset of the form html. > >I observed/heard about the following behavior: >1. prohibit the input, copy and paste of any characters which cannot be >convert to the charset- Netscape 4.x did that. So there are no way to put >Korean characters into ISO-8859-1 form. In this case, what you see is what >you submit. This is most straightforward. Presumably, the CGI (or whatever) working in iso-8859-1, and sending it anything else will get it confused. This (plus 3 maybe) is probably what I would do. >2. replace characters cannot be submit to '?' (N6.2 do that) Not such a good idea; the user things she submitted actual characters, but they didn't get across. Imagine ordering something, and typing in your address, and being billed, but never getting anything because the post office cannot route a package to ?????. >3. if there are ACCEPT_CHARSET specified in the HTML form , try to convert >to different charset. (HTML 4.x say something about this). However, it >will be very bad if one value is in one charset and the other is in a >different one. The original assumption for ACCEPT_CHARSET was that all browsers would use it, so the server would always only see a single encoding. However, uptake on ACCEPT_CHARSET was very slow; I actually don't know positively about any browser where it is implemented. The only way I would suggest it might be used now is: - Only use it with a value of UTF-8. - Only use it if you have server logic that allows to distinguish between UTF-8 and the encoding of the page. On the browser side, implementing ACCEPT_CHARSET is not a bad idea, because if the form is using it, you are okay to assume that the server can deal with what it's asking for. >4. try to convert to UTF-8 if that happen. Same issue as above, we don't >want to see one value in one charset and the other one in a different one. Well, UTF-8 can be distinguished from other encodings quite easily, but you have no guarantee that the server will be able to deal with UTF-8, so it's better not to send UTF-8. >5. convert it to the form charset, and for those character cannot be >converted, conver it to NCR 〹 and then % escaped (the IE6 on my >WinXP do that) So that would be %26%2312345%3B, or something else? How do you know that the server is able to deal with this? Regards, Martin.
Received on Tuesday, 19 February 2002 19:02:22 UTC