HTTP request validation guidelines for implementers from João Penteado on 2021-07-08 (ietf-http-wg@w3.org from July to September 2021)

From: João Penteado <joao@penteado.me>
Date: Thu, 08 Jul 2021 18:18:23 +0000
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <FG0pbGeOrYyTy9Qq7QgDZtZWHwKpohi9eXVW-dkD1lFwkFM3Sqx2T1Wjv8zDZmPusf1EZ0XtKE5XVaF>
Greetings everyone,

This is my first time posting on a IETF mailing list, so I apologize in advance
if this isn't the proper channel for this discussion.

Input validation is a critical security component of most information systems,
and this is especially true for public services available over a untrusted
network such as the Internet.

In the context of HTTP requests, there're several opportunities for early
request validation and many systems employ an dedicated initial validation layer
either internally, as "request middleware", or externally in the form of layer 7
firewalls, AKA Web Application Firewalls (WAFs), or reverse proxies.

It is usually in both the server's and the client's best interest that an
invalid request be rejected as soon as possible and that meaningful error
information be sent back to the client. For this very purpose, RFC7231 defines a
whole class of HTTP error status code (4XX).

However, when implementing an HTTP server, there is no standardization or
consolidated guideline that I could find that address these two points:

1. Which components of the request MUST and SHOULD be validated.
2. In which order SHOULD these components be validated.

There're various flowcharts[1][2] available online that propose different
approaches of which components of the request should be validated first and
which HTTP status should have a "higher priority" when a request has multiple
errors. For instance, if the both Body and the URI exceed the maximum size
allowed by the server and client also exceeds its agreeded rate limit, should
the response code be 413 (Payload Too Large), 414 (Request-URI Too Long) or 429
(Too Many Requests).

As I mentioned earlier, currently, as it is in the server's best interest to
reject an invalid request as soon as possible, the "higher priority" status code
usually is one whose error is found earlier in the validation chain
implementation.

Should a standard or best practice recommendation for the points above be
adopted, clients that receive a client error response code will be able to tell
which checks were successful and servers will have a clear recommendation to
validate certain aspects of the requests they receive, which is still
unfortunately overlooked by many in the wild. It would also reduce inconsistent
behavior among different implementations of a same service that may return
different error codes for the same given request.

As someone who is currently implementing such a validator, I would like to start
this discussion proposing that the order of such an evaluation chain takes into
consideration the following factors, in priority order:

1. The computational complexity of the validation check.
2. The specificity of the returned status code (i.e. checks that when failed
return a generic "400 Bad Request" should be latter in the validation chain).
4. The order in which the (sub)component appears in the request (start-line,
headers and body). Null, size, length and boundary range checks should probably
come first.

The same factors should also apply when evaluating subcomponents of the request,
such as individual headers or body parts.

Finally, additional security concerns, like the classic 404 vs 403 unintentional
information disclosure dilemma, should also probably be addressed in such a
document.

I'd love to get the input of this working group on the relevance and topics I
raised above.

Best regards,

João Penteado


External links

[1] https://www.loggly.com/blog/http-status-code-diagram/
[2] https://www.codetinkerer.com/2015/12/04/choosing-an-http-status-code.html
Received on Friday, 9 July 2021 01:00:07 UTC