Re: Bad header syntax -- is this par for the course?

Robert Olofsson wrote:
> Sort of, many servers are broken and output:
> Content-Type: text/html\n\n<resource data>
> 
> Instead of the correct one:
> Content-Type: text/html\r\n\r\n<resource data>
> 
> There are _many_ perl scripts (and other) that do this.
> So being strict for CRLF parsing is a bad idea.

That shouldn't be possible if the perl script is run through a decent
CGI-supporting HTTP server -- as the HTTP server should canonicalise
the line endings.  However, obviously many are not, not to mention the
Perl scripts which are servers in and of themselves.

Fortunately, the HTTP standards warn of this and suggest just looking
for \n and skipping a preceding \r.

That's what I'd hope for: a document which contains all of the
well-known suggestions for contemporary interoperation.

For example, I have read that Mozilla, IE and Netscape browsers will
all accept \r\n\r\r\n as the "blank line" which ends headers -- and that
they must do so.  I'm not sure if this is true, I have simply read it.

It's not hard to imagine what kind of code generates \r\n\r\r\n

I read the header parsing of Mozilla, and it accepts headers with no
colon in the line -- effectively it ignores them (though it actually
stores them verbartim).  I presume that's because of buggy IIS servers
which, according to the code in Squid, occasionally send 200 Blah Blah
lines in the middle of the headers.

See how much exciting historical information is hidden in these code bases!

I worry about little things like some clients/servers/proxies accept
lines with embedded \r's (not followed by \n), and some treat those as
line endings while others do not, while yet others reject such lines.
Some skip spaces before the header name, others do not, yet others
correctly treat them as continutation lines.  Some treat \r as a space
for this, others do not.  Some skip spaces after the header name,
others do not.  Some that do skip spaces after the header name, treat
\r as a non-space in that place but treat \r as a space in other parts
of the same line; some others treat \r as a space everywhere.  Some
treat a first header line which begins with space as an error, others
as a header begining with a space even despite concatenating later
such lines as continuations.  Some treat a line containing only spaces
as the "blank line" separator, others do not.

Think of the delightful security possibilities, knowing that some
proxies see some headers and other proxies/clients/servers see other
headers in the same text!

And that's just basic isolating of headers.  We haven't even touched
on the _values_ of headers.

-- Jamie

Received on Wednesday, 23 June 2004 14:49:36 UTC