- From: Glenn Adams <glenn@stonehand.com>
- Date: Thu, 25 Jan 96 10:11:21 -0500
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: frystyk@w3.org, nms@nns.ru, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
From: Larry Masinter <masinter@parc.xerox.com>
Date: Wed, 24 Jan 1996 15:35:36 PST
In this particular case, the problem is with section 8.2.1 of RFC
1866 (HTML):
This specification calls for the _characters_ of the form results ...
I think you are focusing too narrowly. The problem goes more deeply.
In particular, the fundamental problem is how to specify the information
needed to decode escaped octets representing non-ASCII character data
which appear in a URI, such as found in an HTTP Simple Request. For
example, all of the following are asking for the same resource:
GET /%1B$BF%7CK%5C%1B(B.HTM
GET /%93%FA%96%7B.HTM
GET /%C6%FC%CB%DC.HTM
GET /e%E5g,%00.%00H%00T%00M
GET /+ZeVnLA-.HTM
The problem is, unless the server knows that the characters encoded with
the URI octet escapement mechanism in these examples use ISO-2022-JP, SHIFT
JIS, EUC-J, UNICODE-1-1, and UNICODE-1-1-UTF7, respectively, then the
serve has no reliable way of decoding the octets as characters.
This problem is endemic to the specification of URIs as such and needs to
be addressed at that level no matter to what use URIs are put.
Regards,
Glenn Adams
Received on Thursday, 25 January 1996 07:14:58 UTC