- From: Adam M. Costello <amc+e9hp4g@nicemice.net>
- Date: Mon, 22 Sep 2003 00:34:01 +0000
- To: www-html-editor@w3.org
I am concerned about the definition of application/x-www-form-urlencoded. HTML 2.0 and HTML 4.01 both say: space characters are replaced by `+', and then reserved characters are escaped as described in RFC 1738: non-alphanumeric characters are replaced by `%HH'... Which is it, reserved characters or non-alphanumeric characters? Either way, the specified process is not reversible, because it perfoms %HH escaping *after* changing spaces to plus-signs. For example, the values "foo+bar" and "foo bar" map to the same thing, either "foo+bar" (if plus-sign is not escaped), or "foo%2Bbar" (if plus-sign is escaped). As far as I know, browsers always violate the spec and do something reversible instead: they do the %HH escaping *before* changing spaces to plus-signs, and they include plus-sign in the set of characters to be escaped. That way, the server can distinguish between "foo%2Bbar" (which means "foo+bar") versus "foo+bar" (which means "foo bar"). Am I correctly understanding the spec, that the specified encoding is non-reversible? Is my observation about browsers accurate, that in practice they always use a reversible encoding? Should this discrepancy be addressed in some W3C note? The XForms draft resolves the reserved/non-alphanumeric question, but retains the non-reversibility: space characters are replaced by +, and then non-ASCII and reserved characters (as defined by [RFC 2396] as amended by subsequent documents in the IETF track) are escaped by replacing the character with one or more octets of the UTF-8 representation of the character, with each octet in turn replaced by %HH... AMC
Received on Sunday, 21 September 2003 20:37:55 UTC