Re: Appropriate use of HTTP status codes for application health checks

On 23/02/2017 11:24 p.m., Willy Tarreau wrote:
> Hi Amos,
> 
> On Thu, Feb 23, 2017 at 10:53:07PM +1300, Amos Jeffries wrote:
>> IMHO a better efficient way for a polling system is to use 204 as "All
>> okay", and 200 as "some problem(s)". No bandwidth wasted with payload on
>> the common Up status, and ability to deliver details about the outage on
>> the Down status.
> 
> In fact it's common to see health check applications return 5xx for a
> very simple reason, the front equipment performing the check (often a
> load balancer) has to deal with these situations anyway, and most use
> cases just want to return "completely up" or "completely dead". But I
> agree that when you want to support the gray area in between, it's much
> better to support intermediary codes. FWIW haproxy also supports a
> special case of 404 to mean "closing soon, no more requests please" so
> that admins can simply touch/rm a file in a docroot. That's just to say
> that there are many valid use cases and tha common sense adapted to what
> components *reliably* support is often the best here.
> 

For an individual health-check you are right. But that is not the
use-case matt has.

The use-case in question is for the response coming from some aggregator
process, which uses health-checks as its input/data. One status code
summarizing the situation of N endpoints.  No 4xx or 5xx is going to be
adequate for that, simply because of what the 400 and 500 defaults mean
to the general HTTP ecosystem.

The individual endpoints being health-tested, sure a 4xx/5xx is usually
best. Squid uses 503 to respond to *all* queries received during teh
final seconds of shutdown so retries can be done for them. Your 404 as a
final status is of most use for active health-check probes to an
individual endpoint.

Amos

Received on Monday, 27 February 2017 04:39:33 UTC