- From: BearHeart / Bill Weinman <bearheart@bearnet.com>
- Date: Thu, 25 Jan 1996 11:56:37 -0600
- To: Larry Masinter <masinter@parc.xerox.com>, glenn@stonehand.com
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
At 08:23 am 1/25/96 PST, Larry Masinter spake: >> ... all of the following are asking for the same resource: > >> GET /%1B$BF%7CK%5C%1B(B.HTM >> GET /%93%FA%96%7B.HTM >> GET /%C6%FC%CB%DC.HTM >> GET /e%E5g,%00.%00H%00T%00M >> GET /+ZeVnLA-.HTM > >> The problem is, unless the server knows that the characters encoded with >> the URI octet escapement mechanism in these examples use ISO-2022-JP, SHIFT >> JIS, EUC-J, UNICODE-1-1, and UNICODE-1-1-UTF7, respectively, then the >> serve has no reliable way of decoding the octets as characters. > >you cannot possibly mean that the *same* HTTP server will employ >2022-jp, shift jis, euc-j, unicode-1-1 and unicode-1-1-utf7. Methinks the problem is not that a server may be speaking all these charsets, because the GET request is generated by the client. So the problem is that one server may receive GET requests from different clients that are using different charsets. I can think of two possible solutions: a) Require that all clients use a given charset for GET requests (e.g. unicode-1-1-utf7). Disadvantage: This seems a bit brute-force, and it may require some clients to speak more charsets than they would otherwise need. b) Allow the client to optionally indicate the charset being used along with the GET request. Disadvantage: This would require extending the request line syntax. I prefer (a), but I don't really like either of them. Can anybody think of a more elegant solution? +--------------------------------------------------------------------------+ | BearHeart / Bill Weinman | BearHeart@bearnet.com | http://www.bearnet.com/ | Author of The CGI Book -- http://www.bearnet.com/cgibook/
Received on Thursday, 25 January 1996 10:09:52 UTC