Re: what constitutes an "invalid" content-length from Alex Rousskov on 2016-07-12 (ietf-http-wg@w3.org from July to September 2016)

From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Tue, 12 Jul 2016 10:20:29 -0600
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Cc: Adrien de Croy <adrien@qbik.com>
Message-ID: <578518CD.8070305@measurement-factory.com>

On 07/12/2016 07:31 AM, Adrien de Croy wrote:

> just dealing with a site that sends more payload data than is indicated
> in the Content-Length header.

>From the standards point of view, that is _not_ what you are dealing
with. You are dealing with a site that sends two responses, the first
response is proper HTTP. The second response is garbage.

> RFC7230 sections 3.3.2 (Content-Length), 3.3.3 (Message body length),
> and 3.3.4 (Handling incomplete messages) only contemplate issues around
> Content-Length specifying more bytes than are received, not fewer.

>From the standards point of view, it is impossible for the
Content-Length to specify fewer bytes than the message has. Irrelevant
for this discussion cases aside, the message end is defined by the
Content-Length header value. One cannot have more than what was promised
because one stops assembling the message [body] after the promised
number of bytes were added. Any "leftovers" are another message or
garbage, depending on Connection:close, pipelining, and similar factors.

> I guess one could argue that a wrong C-L value is "invalid", but it's
> not clear that invalid in this context simply means it doesn't parse, or
> is otherwise non-compliant with the ABNF.

It is valid from protocol point of view. You know it is "wrong" only
because you can (or you think you can) distinguish garbage from the end
of the content.

> So, it's not clear what the browser and/or proxy response should be.

There is no single right answer to that. A compliant client (including
proxies) ought to treat leftovers as post-message gardbage or another
message. A real-world client may identify specific cases where leftovers
are likely to be the end of the message content and ignore
Content-Length in those cases. The cases where such behavior would be a
good idea would vary from agent to agent, from one deployment to another.

> I would expect it's in everyone's best interest if sites that have
> broken framing are forced to be fixed.  This won't happen if browsers
> "just work" for the site.

The ever-popular "force sites to be fixed" approach rarely fixes enough
real-word sites to remove special treatment code. See Patrick's response
for a good illustration.

> Is there a special behaviour we should agree on for such cases?

We could agree to violate the standard in one or two special cases, but
any formal agreement would probably result in a few more broken sites
because more folks will tolerate them, decreasing the probability that
they will be fixed.

I can think of one special case where it is more-or-less safe to ignore
response Content-Length:

* the HTTP/1 connection is not persistent,
* no additional outstanding pipelined requests on that connection,
* the unique Content-Length header field is syntactically valid, and
* more bytes were read during the last network read than C-L promises.

The combination of these conditions can trigger [optional] "robustness"
code that reads until connection closure and re-sends leftovers/garbage
to the next hop (or displays it to the user), opening a message
smuggling attack vector.

Needless to say, there are benign leftover cases that the above
conditions do not cover.

Cheers,

Alex.

Received on Tuesday, 12 July 2016 16:21:27 UTC