- From: Glenn Adams <glenn@stonehand.com>
- Date: Thu, 25 Jan 96 10:11:21 -0500
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: frystyk@w3.org, nms@nns.ru, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
From: Larry Masinter <masinter@parc.xerox.com> Date: Wed, 24 Jan 1996 15:35:36 PST In this particular case, the problem is with section 8.2.1 of RFC 1866 (HTML): This specification calls for the _characters_ of the form results ... I think you are focusing too narrowly. The problem goes more deeply. In particular, the fundamental problem is how to specify the information needed to decode escaped octets representing non-ASCII character data which appear in a URI, such as found in an HTTP Simple Request. For example, all of the following are asking for the same resource: GET /%1B$BF%7CK%5C%1B(B.HTM GET /%93%FA%96%7B.HTM GET /%C6%FC%CB%DC.HTM GET /e%E5g,%00.%00H%00T%00M GET /+ZeVnLA-.HTM The problem is, unless the server knows that the characters encoded with the URI octet escapement mechanism in these examples use ISO-2022-JP, SHIFT JIS, EUC-J, UNICODE-1-1, and UNICODE-1-1-UTF7, respectively, then the serve has no reliable way of decoding the octets as characters. This problem is endemic to the specification of URIs as such and needs to be addressed at that level no matter to what use URIs are put. Regards, Glenn Adams
Received on Thursday, 25 January 1996 07:14:58 UTC