Re: Appropriate use of HTTP status codes for application health checks from Amos Jeffries on 2017-02-27 (ietf-http-wg@w3.org from January to March 2017)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Mon, 27 Feb 2017 17:38:49 +1300
To: Willy Tarreau <w@1wt.eu>
Cc: ietf-http-wg@w3.org
Message-ID: <d2b11486-267e-230f-bf3d-821ee9036f56@treenet.co.nz>

On 23/02/2017 11:24 p.m., Willy Tarreau wrote:
> Hi Amos,
> 
> On Thu, Feb 23, 2017 at 10:53:07PM +1300, Amos Jeffries wrote:
>> IMHO a better efficient way for a polling system is to use 204 as "All
>> okay", and 200 as "some problem(s)". No bandwidth wasted with payload on
>> the common Up status, and ability to deliver details about the outage on
>> the Down status.
> 
> In fact it's common to see health check applications return 5xx for a
> very simple reason, the front equipment performing the check (often a
> load balancer) has to deal with these situations anyway, and most use
> cases just want to return "completely up" or "completely dead". But I
> agree that when you want to support the gray area in between, it's much
> better to support intermediary codes. FWIW haproxy also supports a
> special case of 404 to mean "closing soon, no more requests please" so
> that admins can simply touch/rm a file in a docroot. That's just to say
> that there are many valid use cases and tha common sense adapted to what
> components *reliably* support is often the best here.
> 

For an individual health-check you are right. But that is not the
use-case matt has.

The use-case in question is for the response coming from some aggregator
process, which uses health-checks as its input/data. One status code
summarizing the situation of N endpoints.  No 4xx or 5xx is going to be
adequate for that, simply because of what the 400 and 500 defaults mean
to the general HTTP ecosystem.

The individual endpoints being health-tested, sure a 4xx/5xx is usually
best. Squid uses 503 to respond to *all* queries received during teh
final seconds of shutdown so retries can be done for them. Your 404 as a
final status is of most use for active health-check probes to an
individual endpoint.

Amos

Received on Monday, 27 February 2017 04:39:33 UTC