- From: BearHeart / Bill Weinman <bearheart@bearnet.com>
- Date: Thu, 25 Jan 1996 11:56:37 -0600
- To: Larry Masinter <masinter@parc.xerox.com>, glenn@stonehand.com
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
At 08:23 am 1/25/96 PST, Larry Masinter spake:
>> ... all of the following are asking for the same resource:
>
>> GET /%1B$BF%7CK%5C%1B(B.HTM
>> GET /%93%FA%96%7B.HTM
>> GET /%C6%FC%CB%DC.HTM
>> GET /e%E5g,%00.%00H%00T%00M
>> GET /+ZeVnLA-.HTM
>
>> The problem is, unless the server knows that the characters encoded with
>> the URI octet escapement mechanism in these examples use ISO-2022-JP, SHIFT
>> JIS, EUC-J, UNICODE-1-1, and UNICODE-1-1-UTF7, respectively, then the
>> serve has no reliable way of decoding the octets as characters.
>
>you cannot possibly mean that the *same* HTTP server will employ
>2022-jp, shift jis, euc-j, unicode-1-1 and unicode-1-1-utf7.
Methinks the problem is not that a server may be speaking
all these charsets, because the GET request is generated by
the client. So the problem is that one server may receive
GET requests from different clients that are using different
charsets.
I can think of two possible solutions:
a) Require that all clients use a given charset for GET
requests (e.g. unicode-1-1-utf7).
Disadvantage: This seems a bit brute-force, and it may
require some clients to speak more charsets than they would
otherwise need.
b) Allow the client to optionally indicate the charset being
used along with the GET request.
Disadvantage: This would require extending the request
line syntax.
I prefer (a), but I don't really like either of them. Can anybody
think of a more elegant solution?
+--------------------------------------------------------------------------+
| BearHeart / Bill Weinman | BearHeart@bearnet.com | http://www.bearnet.com/
| Author of The CGI Book -- http://www.bearnet.com/cgibook/
Received on Thursday, 25 January 1996 10:09:52 UTC