Re: Charsets revisited

At 08:23 am 1/25/96 PST, Larry Masinter spake:
>> ... all of the following are asking for the same resource:
>
>>  GET /%1B$BF%7CK%5C%1B(B.HTM
>>  GET /%93%FA%96%7B.HTM
>>  GET /%C6%FC%CB%DC.HTM
>>  GET /e%E5g,%00.%00H%00T%00M
>>  GET /+ZeVnLA-.HTM
>
>> The problem is, unless the server knows that the characters encoded with
>> the URI octet escapement mechanism in these examples use ISO-2022-JP, SHIFT
>> JIS, EUC-J, UNICODE-1-1, and UNICODE-1-1-UTF7, respectively, then the
>> serve has no reliable way of decoding the octets as characters.
>
>you cannot possibly mean that the *same* HTTP server will employ
>2022-jp, shift jis, euc-j, unicode-1-1 and unicode-1-1-utf7.

   Methinks the problem is not that a server may be speaking 
all these charsets, because the GET request is generated by 
the client. So the problem is that one server may receive 
GET requests from different clients that are using different 
charsets. 

   I can think of two possible solutions: 

      a) Require that all clients use a given charset for GET 
         requests (e.g. unicode-1-1-utf7).

         Disadvantage: This seems a bit brute-force, and it may 
      require some clients to speak more charsets than they would 
      otherwise need. 

      b) Allow the client to optionally indicate the charset being 
         used along with the GET request. 

         Disadvantage: This would require extending the request 
      line syntax. 

   I prefer (a), but I don't really like either of them. Can anybody 
think of a more elegant solution? 


+--------------------------------------------------------------------------+
| BearHeart / Bill Weinman | BearHeart@bearnet.com | http://www.bearnet.com/ 
| Author of The CGI Book -- http://www.bearnet.com/cgibook/ 

Received on Thursday, 25 January 1996 10:09:52 UTC