Re: New Version Notification for draft-divilly-status-555-00.txt

Thanks for your feedback, see my comments below:

> On 25 Mar 2020, at 12:42, Amos Jeffries <squid3@treenet.co.nz> wrote:
> 
> On 25/03/20 12:02 pm, Colm Divilly wrote:
>> Hi All,
>>  for a product I help develop: Oracle REST Data Services (ORDS) [1], we
>> provide a hosted environment where third party users can dynamically at
>> runtime define their own REST APIs using SQL. Naturally there will be
>> coding and runtime errors in these REST APIs.
> 
> 
>> This creates a challenge
>> for users and operators. When such a resource raises an error the only
>> appropriate HTTP status code to use is 500 Internal Server Error. This
>> causes confusion as it looks like there is something wrong with ORDS,
>> where in fact there is only something wrong with the user supplied SQL.
> 
> 
> I see no guidance details about when this status is expected to be used,
> and the description given is lacking some important details:
> 
> 
> * Is this status used to reply to requests containing bad SQL?
No, this is not the model. At some earlier point user defines a REST endpoint powered by some program that they define (in our case SQL statements)
At a later point same/different user accesses the REST endpoint, the endpoint fails because of an error in the program.
> 
> If yes, then use of any 5xx is incorrect. The broken thing being part
> of the client message means it should be a 4xx status of some type.
> 
> 
> If no;
> 
> * Is the origin pre-configured with some SQL it failed to validate /
> wrongly accepted, which is now breaking responses?
> 
> If yes, then 500 is indeed correct status. The "Internal Error" just
> happens to have been created by the SQL acceptance.
> 

Yes, understood, that is the current state of the art. The problem is that this 500 Internal Server is too vague. It confuses the user making them think there is a problem with the server engine, rather than the specific user defined resource They cannot distinguish that the problem lies with the particular REST endpoint. It confuses automatic monitoring tools as they cannot distinguish between serious 500 Internal Server errors that require remediation and those of this kind where the fact that a given user created REST endpoint is error-ring is typically a benign state. It is expected that users will create endpoints that have error conditions. Those error conditions do not affect the overall availability of the server, just that particular user defined endpoint. They need to be clearly marked as such and the resource author provided with tools to review logs associated with the error condition to enable them to take action to address the error.

> Otherwise implies no SQL is known to the origin. So 404 would be
> appropriate. No such resource exists.
> 
> Or, ...
> 
> * Is the origin pre-configured with some SQL it has already accepted,
> but which does not produce a resource instance?
> 
> If yes, then the status should be 404 or 406. The origin is not broken,
> it simply does not have any representation that meets the request.
> 
> 
> 
> At the very least it would be good to describe what alternatives (other
> status, and other mechanisms) have been considered and why they are
> unworkable.
> 
We have done the following:

1. Added an additional proprietary header named Error-Reason, you can think of this as a sub-status code, that indicates the root cause of the internal server error lies in the user defined resource
2. Added explanatory text to the response document body for the 500 Internal Server Error status explaining that the root cause of the error is an error in the user defined resource

Problems with 1 are that it is not logged in things like access logs belonging to other intermediaries sitting in front of our product. Intermediary sees a 500 error in their access log, forms understanding that our software is broken, files support ticket, etc. 
Problems with 2 are that end users generally don't read the text carefully, they focus on the status code and the words 'Internal Server Error' and google that and form understanding that our software is broken, files support ticket etc.

> 
> PS. IIRC the number itself is supposed to be assigned by IANA *if* this
> document reaches acceptance. Not chosen by draft authors.
> 
Understood, we are not wedded to any particular status code number, we just chose one that was memorable and unallocated, it can change no problem.
> Amos
> 

Received on Wednesday, 25 March 2020 13:09:06 UTC