- From: Alex Rousskov <rousskov@measurement-factory.com>
- Date: Tue, 12 Jul 2016 10:20:29 -0600
- To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
- Cc: Adrien de Croy <adrien@qbik.com>
On 07/12/2016 07:31 AM, Adrien de Croy wrote: > just dealing with a site that sends more payload data than is indicated > in the Content-Length header. >From the standards point of view, that is _not_ what you are dealing with. You are dealing with a site that sends two responses, the first response is proper HTTP. The second response is garbage. > RFC7230 sections 3.3.2 (Content-Length), 3.3.3 (Message body length), > and 3.3.4 (Handling incomplete messages) only contemplate issues around > Content-Length specifying more bytes than are received, not fewer. >From the standards point of view, it is impossible for the Content-Length to specify fewer bytes than the message has. Irrelevant for this discussion cases aside, the message end is defined by the Content-Length header value. One cannot have more than what was promised because one stops assembling the message [body] after the promised number of bytes were added. Any "leftovers" are another message or garbage, depending on Connection:close, pipelining, and similar factors. > I guess one could argue that a wrong C-L value is "invalid", but it's > not clear that invalid in this context simply means it doesn't parse, or > is otherwise non-compliant with the ABNF. It is valid from protocol point of view. You know it is "wrong" only because you can (or you think you can) distinguish garbage from the end of the content. > So, it's not clear what the browser and/or proxy response should be. There is no single right answer to that. A compliant client (including proxies) ought to treat leftovers as post-message gardbage or another message. A real-world client may identify specific cases where leftovers are likely to be the end of the message content and ignore Content-Length in those cases. The cases where such behavior would be a good idea would vary from agent to agent, from one deployment to another. > I would expect it's in everyone's best interest if sites that have > broken framing are forced to be fixed. This won't happen if browsers > "just work" for the site. The ever-popular "force sites to be fixed" approach rarely fixes enough real-word sites to remove special treatment code. See Patrick's response for a good illustration. > Is there a special behaviour we should agree on for such cases? We could agree to violate the standard in one or two special cases, but any formal agreement would probably result in a few more broken sites because more folks will tolerate them, decreasing the probability that they will be fixed. I can think of one special case where it is more-or-less safe to ignore response Content-Length: * the HTTP/1 connection is not persistent, * no additional outstanding pipelined requests on that connection, * the unique Content-Length header field is syntactically valid, and * more bytes were read during the last network read than C-L promises. The combination of these conditions can trigger [optional] "robustness" code that reads until connection closure and re-sends leftovers/garbage to the next hop (or displays it to the user), opening a message smuggling attack vector. Needless to say, there are benign leftover cases that the above conditions do not cover. Cheers, Alex.
Received on Tuesday, 12 July 2016 16:21:27 UTC