Re: HTTP 1.0&1.1 URL safe characters conflict with HTML?

On Fri, 9 Feb 1996, Roy T. Fielding wrote:

> > Reviewing not quite current HTTP 1.0 and 1.1 drafts I noticed that the 
> > plus sign (+) character is included as a safe character.
> > 
> > This would seem to be in conflict with current practice and the HTML
> > RFC where the + is part of the url encoding scheme to represent blanks.
> 
> It is safe, and "+" does not represent blanks.  The "+" character is
> used to separate keywords in a URL generated by an ISINDEX query,
> but those separators are not equivalent to blanks.  Since the query part
> is generated, there is no need for "+" to be reserved in the URL syntax,
> which is why it is not reserved in RFC 1738.

On this you are wrong ... See RFC 1866, section 8.2.1. + is clearly
used to encode spaces:
        1. The form field names and values are escaped: space
        characters are replaced by `+', and then reserved characters
        are escaped as per [URL]; that is, non-alphanumeric
        characters are replaced by `%HH', a percent sign and two
        hexadecimal digits representing the ASCII code of the
        character. Line breaks, as in multi-line text field values,
        are represented as CR LF pairs, i.e. `%0D%0A'.           

Furthermore, current practice clearly follows the description in
RFC1866. For example, Netscape 1.1.  At least two search services
used + as a must include flag (AltaVista and Infoseek). If the +
character was not %xx encoded, it would be impossible to decode
the user's input.

unsafe or reserved, I don't care but + isn't safe. If the choice is
'reserved', then I believe % should also be 'reserved' as it has
exactly the same role as + as it is used to encode other
characters. I see no justification for + to be reserved and
% unsafe.

Dave Morris

Received on Sunday, 11 February 1996 00:34:23 UTC