- From: Adam M. Costello <amc+e9hp4g@nicemice.net>
- Date: Mon, 22 Sep 2003 00:34:01 +0000
- To: www-html-editor@w3.org
I am concerned about the definition of
application/x-www-form-urlencoded. HTML 2.0 and HTML 4.01 both say:
space characters are replaced by `+', and then reserved characters
are escaped as described in RFC 1738: non-alphanumeric characters
are replaced by `%HH'...
Which is it, reserved characters or non-alphanumeric characters? Either
way, the specified process is not reversible, because it perfoms %HH
escaping *after* changing spaces to plus-signs. For example, the values
"foo+bar" and "foo bar" map to the same thing, either "foo+bar" (if
plus-sign is not escaped), or "foo%2Bbar" (if plus-sign is escaped).
As far as I know, browsers always violate the spec and do something
reversible instead: they do the %HH escaping *before* changing spaces
to plus-signs, and they include plus-sign in the set of characters to
be escaped. That way, the server can distinguish between "foo%2Bbar"
(which means "foo+bar") versus "foo+bar" (which means "foo bar").
Am I correctly understanding the spec, that the specified encoding is
non-reversible? Is my observation about browsers accurate, that in
practice they always use a reversible encoding? Should this discrepancy
be addressed in some W3C note?
The XForms draft resolves the reserved/non-alphanumeric question, but
retains the non-reversibility:
space characters are replaced by +, and then non-ASCII and reserved
characters (as defined by [RFC 2396] as amended by subsequent
documents in the IETF track) are escaped by replacing the character
with one or more octets of the UTF-8 representation of the
character, with each octet in turn replaced by %HH...
AMC
Received on Sunday, 21 September 2003 20:37:55 UTC