- From: <Ed_Batutis/CAM/Lotus@crd.lotus.com>
- Date: Fri, 20 Dec 1996 14:40:57 -0400
- To: www-international@w3.org
In regard to character sets I'd like to see the following happen: In the long run all browsers would be able to deal with a document tagged with the charset utf-8, along with a limited well-known set of popular charsets (listed in a new RFC-yet-to-be-writte). In this world the browser would not send Accept-Charset because it can literally accept anything and deal with it in some reasonable way. In the shorter run, browsers that can deal with utf-8 would send Accept-Charset with two charsets listed: 1) utf-8, 2) a character set that the browser can deal effectively with. This should be interpreted as "I can deal with utf-8 appropriately, lacking that I like charset 2) the best, but if all else fails, send me any encoding and I'll do my best." Of course all documents should be tagged with the actual charset. In regard to POSTed data, there are good solutions and browser/server vendors need to agree on one (or more) as soon as possible. Lacking any new mechanisms, servers should assume the form data is in the same encoding as the form document. Maybe this is the state-of-the-art anyway. In any case, this is obviously inefficient because if the server serves documents with different encodings it places an undue burden on the server. It would be better to tag the return data so that the server does not need to look at the original document. I've read about or imagined several ways this can be done: 1) Use the mechanism proposed by Larry Masinter in multipart/form-data. 2) Use "application/x-www-form-urlencoded ; charset=<one of the well-known encodings>" 3) Create a new Media Type that is identical to application/x-www-form-urlencoded but allows a charset parameter 4) Use the "charset field" approach (a charset is sent back as the value for a special field that is hidden from the user) 5) Send an Accept-Charset on the POST All of these solutions pose difficulties in regard to compatibility. In the longer run 1) seems like a very flexible solution. In the shorter run, my hope would be that 2) or 4) or 5) would work in that servers would ignore the parameter/field if they didn't know how to deal with it. If the server did know about the parameter, it would also be able to deal with all the popular charsets in the RFC yet-to-be-written. 3) seems like a bad idea--nobody needs a new Media Type. Servers should be updated to be able to deal with all the encodings listed in the RFC-yet-to-be-written. (Until it is written, all the charsets in the HTTP 1.1 appendix.) I'm against the server sending Accept-Charset. Instead servers should be updated to handle the encodings. As a matter of style, I like 2) better than 4) or 5) because it seems more consistent, but they are all functionally identical. I would hope we could agree to use only one, but if quick agreement proves impractical it would be best to support 2), 4), and 5). In any case, properly-tagged utf-8 should be used if 8859-1 can't support the characters needed. I think this is best long-term solution.
Received on Friday, 20 December 1996 14:47:12 UTC