rewritten section on message body and length

As part of the changes for draft 11, I merged the misnamed section on
message length into the message body section and then rewrote the
steps for determining the message body length to remove the
ambiguities noted previously in tickets #28, #90, and #95.

The primary additions are requirements on how to handle messages
with multiple or invalid content-length values, or both
transfer-encoding and content-length.  Also, multipart/byteranges
has been removed as a length-determinator.

It is probably easier to read in plain text than as a diff, so here
it is for your review:

==========

3.3.  Message Body

   The message-body (if any) of an HTTP message is used to carry the
   payload body associated with the request or response.

     message-body = *OCTET

   The message-body differs from the payload body only when a transfer-
   coding has been applied, as indicated by the Transfer-Encoding header
   field (Section 9.7).  When one or more transfer-codings are applied
   to a payload in order to form the message-body, the Transfer-Encoding
   header field MUST contain the list of transfer-codings applied.
   Transfer-Encoding is a property of the message, not of the payload,
   and thus MAY be added or removed by any implementation along the
   request/response chain under the constraints found in Section 6.2.

   The rules for when a message-body is allowed in a message differ for
   requests and responses.

   The presence of a message-body in a request is signaled by the
   inclusion of a Content-Length or Transfer-Encoding header field in
   the request's header fields, even if the request method does not
   define any use for a message-body.  This allows the request message
   framing algorithm to be independent of method semantics.

   For response messages, whether or not a message-body is included with
   a message is dependent on both the request method and the response
   status code (Section 5.1.1).  Responses to the HEAD request method
   never include a message-body because the associated response header
   fields (e.g., Transfer-Encoding, Content-Length, etc.) only indicate
   what their values would have been if the method had been GET.  All
   1xx (Informational), 204 (No Content), and 304 (Not Modified)
   responses MUST NOT include a message-body.  All other responses do
   include a message-body, although the body MAY be of zero length.

   The length of the message-body is determined by one of the following
   (in order of precedence):

   1.  Any response to a HEAD request and any response with a status
       code of 100-199, 204, or 304 is always terminated by the first
       empty line after the header fields, regardless of the header
       fields present in the message, and thus cannot contain a message-
       body.

   2.  If a Transfer-Encoding header field (Section 9.7) is present and
       the "chunked" transfer-coding (Section 6.2) is the final
       encoding, the message-body length is determined by reading and
       decoding the chunked data until the transfer-coding indicates the
       data is complete.

       If a Transfer-Encoding header field is present in a response and
       the "chunked" transfer-coding is not the final encoding, the
       message-body length is determined by reading the connection until
       it is closed by the server.  If a Transfer-Encoding header field
       is present in a request and the "chunked" transfer-coding is not
       the final encoding, the message-body length cannot be determined
       reliably; the server MUST respond with the 400 (Bad Request)
       status code and then close the connection.

       If a message is received with both a Transfer-Encoding header
       field and a Content-Length header field, the Transfer-Encoding
       overrides the Content-Length.  Such a message might indicate an
       attempt to perform request or response smuggling (bypass of
       security-related checks on message routing or content) and thus
       should be handled as an error.  The provided Content-Length MUST
       be removed, prior to forwarding the message downstream, or
       replaced with the real message-body length after the transfer-
       coding is decoded.

   3.  If a message is received without Transfer-Encoding and with
       either multiple Content-Length header fields or a single Content-
       Length header field with an invalid value, then the message
       framing is invalid and MUST be treated as an error to prevent
       request or response smuggling.  If this is a request message, the
       server MUST respond with a 400 (Bad Request) status code and then
       close the connection.  If this is a response message received by
       a proxy or gateway, the proxy or gateway MUST discard the
       received response, send a 502 (Bad Gateway) status code as its
       downstream response, and then close the connection.  If this is a
       response message received by a user-agent, the message-body
       length is determined by reading the connection until it is
       closed; an error SHOULD be indicated to the user.

   4.  If a valid Content-Length header field (Section 9.2) is present
       without Transfer-Encoding, its decimal value defines the message-
       body length in octets.  If the actual number of octets sent in
       the message is less than the indicated Content-Length, the
       recipient MUST consider the message to be incomplete and treat
       the connection as no longer usable.  If the actual number of
       octets sent in the message is more than the indicated Content-
       Length, the recipient MUST only process the message-body up to
       the field value's number of octets; the remainder of the message
       MUST either be discarded or treated as the next message in a
       pipeline.  For the sake of robustness, a user-agent MAY attempt
       to detect and correct such an error in message framing if it is
       parsing the response to the last request on on a connection and
       the connection has been closed by the server.

   5.  If this is a request message and none of the above are true, then
       the message-body length is zero (no message-body is present).

   6.  Otherwise, this is a response message without a declared message-
       body length, so the message-body length is determined by the
       number of octets received prior to the server closing the
       connection.

   Since there is no way to distinguish a successfully completed, close-
   delimited message from a partially-received message interrupted by
   network failure, implementations SHOULD use encoding or length-
   delimited messages whenever possible.  The close-delimiting feature
   exists primarily for backwards compatibility with HTTP/1.0.

   A server MAY reject a request that contains a message-body but not a
   Content-Length by responding with 411 (Length Required).

   Unless a transfer-coding other than "chunked" has been applied, a
   client that sends a request containing a message-body SHOULD use a
   valid Content-Length header field if the message-body length is known
   in advance, rather than the "chunked" encoding, since some existing
   services respond to "chunked" with a 411 (Length Required) status
   code even though they understand the chunked encoding.  This is
   typically because such services are implemented via a gateway that
   requires a content-length in advance of being called and the server
   is unable or unwilling to buffer the entire request before
   processing.

   A client that sends a request containing a message-body MUST include
   a valid Content-Length header field if it does not know the server
   will handle HTTP/1.1 (or later) requests; such knowledge can be in
   the form of specific user configuration or by remembering the version
   of a prior received response.

   Request messages that are prematurely terminated, possibly due to a
   cancelled connection or a server-imposed time-out exception, MUST
   result in closure of the connection; sending an HTTP/1.1 error
   response prior to closing the connection is OPTIONAL.  Response
   messages that are prematurely terminated, usually by closure of the
   connection prior to receiving the expected number of octets or by
   failure to decode a transfer-encoded message-body, MUST be recorded
   as incomplete.  A user agent MUST NOT render an incomplete response
   message-body as if it were complete (i.e., some indication must be
   given to the user that an error occurred).  Cache requirements for
   incomplete responses are defined in Section 2.1.1 of [Part6].

   A server MUST read the entire request message-body or close the
   connection after sending its response, since otherwise the remaining
   data on a persistent connection would be misinterpreted as the next
   request.  Likewise, a client MUST read the entire response message-
   body if it intends to reuse the same connection for a subsequent
   request.  Pipelining multiple requests on a connection is described
   in Section 7.1.2.2.

==========

....Roy

Received on Wednesday, 28 July 2010 03:53:13 UTC