CGI URLs in HREFs (was Re: An smtp URL scheme)

Walter Ian Kaye (walter@natural-innovations.com)
Sun, 13 Jul 1997 00:07:11 -0700


Message-Id: <v0310285eafee2a020172@[205.149.180.135]>
In-Reply-To: <3.0.2.32.19970711132056.0090ac30@pop.ma.ultranet.com>
Date: Sun, 13 Jul 1997 00:07:11 -0700
To: www-html@w3.org
From: Walter Ian Kaye <walter@natural-innovations.com>
Subject: CGI URLs in HREFs (was Re: An smtp URL scheme)

At 9:41a -0400 07/10/97, Greg Marr wrote:
 > At 10:04 PM 7/9/97 -0700, Walter Ian Kaye wrote:
 > >At 9:51a -0400 07/09/97, Greg Marr wrote:
 > > > However, the preferred way is
 > > > smtp://host/user?subject=subject+body=yes+X-header=header
 > >
 > >This makes no sense. CGI's expect ampersands as delimiters.
 >
 > Except for the ones that expect +'s.

At 1:20p -0400 07/11/97, Greg Marr amended:
 >
 > The separator [...] was ; not + [...]


OK, I found it... in RFC 1866... :-)


 > 8.2.1. The form-urlencoded Media Type
 >
 >    The default encoding for all forms is 'application/x-www-form-
 >    urlencoded'. A form data set is represented in this media type as
 >    follows:
 >
 >         1. The form field names and values are escaped: space
 >         characters are replaced by '+', and then reserved characters
 >         are escaped as per [URL]; that is, non-alphanumeric
 >         characters are replaced by '%HH', a percent sign and two
 >         hexadecimal digits representing the ASCII code of the
 >         character. Line breaks, as in multi-line text field values,
 >         are represented as CR LF pairs, i.e. '%0D%0A'.
 >
 >         2. The fields are listed in the order they appear in the
 >         document with the name separated from the value by '=' and
 >         the pairs separated from each other by '&'. Fields with null
 >         values may be omitted. In particular, unselected radio
 >         buttons and checkboxes should not appear in the encoded
 >         data, but hidden fields with VALUE attributes present
 >         should.
 >
 >             NOTE - The URI from a query form submission can be
 >             used in a normal anchor style hyperlink.
 >             Unfortunately, the use of the '&' character to
 >             separate form fields interacts with its use in SGML
 >             attribute values as an entity reference delimiter.
 >             For example, the URI 'http://host/?x=1&y=2' must be
 >             written '<a href="http://host/?x=1&#38;y=2"' or '<a
 >             href="http://host/?x=1&amp;y=2">'.
 >
 >             HTTP server implementors, and in particular, CGI
 >             implementors are encouraged to support the use of
 >             ';' in place of '&' to save users the trouble of
 >             escaping '&' characters this way.


So, I guess I'll have to update my 'wiklib.pl' library to check for
semicolons instead of just assuming ampersands. Now to come up with
a good algorithm. How about this:

  1. Count number of '='s in query string.
  2. If more than one, determine whether '&' or ';' is used as field
     separator:

     ??? Extract substring from first '=' to second '=', then count
         number of '&'s and ';'s, and compare counts. ???

  3. Split accordingly.

That oughta do it, eh?

__________________________________________________________________________
  Walter Ian Kaye <boo_at_best*com>    Programmer - Excel, AppleScript,
          Mountain View, CA                         ProTERM, FoxPro, HTML
 http://www.natural-innovations.com/     Musician - Guitarist, Songwriter