URLencoding.

I am attempting to determine exactly which special characters should be escaped
to Hex and which should not be escaped during urlencoding. The HTML 4.01
Specification is very unclear and RFC1738 does not help at all. The mailing list
archive produces only a partial thread which only partly help to clarify the
situation.

A quick Web search indicates that others are also not clear about urlencoding.
The prevailing practice seems to be to escape everything except alphanumerics
and space which becomes +. For example, see the JAVA urlencoding class at:

http://www.javasoft.com/products/jdk/1.0.2/api/java.net.URLEncoder.html

Fortunately RFC1738 is permissive so the overencoding practice will not harm
anything.

Can anyone give me a definitive answer as to which characters need not be
escaped?

Perhaps Section 17.3.4 of the HTML Spec should be clarified.

TIA
--Dave

Received on Thursday, 6 April 2000 23:23:32 UTC