W3C home > Mailing lists > Public > www-html@w3.org > December 2000

Re: regular expressions for CGI input/env

From: Aleksandar Susnjar <shule@planetintra.com>
Date: Sat, 30 Dec 2000 00:23:53 -0500
Message-ID: <001201c07220$b3adb6c0$3c1ee218@wido1.on.home.com>
To: "Gustavo Vieira Goncalves Coelho Rios" <gustavo@ifour.com.br>, <www-html@w3.org>
Key characters are:

'?' - starts the form data portion in the URL-Encoded (HTTP GET method) form
'&' - separates name-value pairs
'=' - separates names and values
'%' - begins a two-digit hex literal
'+' - used instead of a space (standard, although Microsoft oftenly replaces
it with %20)
' ' (space) - must be replaced with either '+' or '%20'

If need to be specified, abovementione characters and non-characters (0 <=
code < 32) must be specified using their alternative form (e.g. hex form
%xx). All other characters (including char codes greater than 127) should be
fine, but if you are only interested in parsing the received input and not
generating it you only need to know the following:

'?' - starts the form data portion in the URL-Encoded (HTTP GET method) form
'&' - starts a new name-(value) pair
'=' - separates names and values; NOTE: sometimes only name is specified -
there is no value (e.g. checkboxes); in this case no '=' is used
'%' - begins a two-digit hex literal - next two characters are to be treated
as a hex code of the character. If hex digits characters are invalid assume
that the client got it wrong and that the '%' did not start a hex code ...
but is just what it already is - a '%'. Continue parsing with the character
immediatelly after '%' - not after the 'digits' and do not put the digits
immediatelly in the buffer/output.
'+' - means (should be replaced with a) space
' ' (space) - If you get it and you know that it IS a part of the data, then
consider it a space ... but it should not be like that...

That's all there is to it... Same for both GET and POST... (not for
multipart POST, though)

Aleksandar Susnjar

----- Original Message -----
From: "Gustavo Vieira Goncalves Coelho Rios" <gustavo@ifour.com.br>
To: <www-html@w3.org>
Sent: Saturday, December 30, 2000 12:07 AM
Subject: regular expressions for CGI input/env

> Dear gentleman,
> I respectfully request your help in order to obtain initial information
> about the acceptable input for cgi script. I am writing code for CGI
> input manipulation since i need information about what's is a valid data
> to be passed and how is the proccess.
> 1. May some send me a regular expression of the caracter stream valid
> for the input of a CGI program?
> 2. I known some caracters are encoded, like '#', '@', to its hexadecimal
> number. Where can i obtain a list for all the encoded caracters ? Better
> yet, which RFC(s) especify(ies) these standards ?
> Thanks a lot for your time and cooperation.
> PS: I don't really known if these is write place to request such a help,
> so please, forgive me for any inconvience.
Received on Saturday, 30 December 2000 00:24:20 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:55 UTC