- From: Tim Bray <tbray@textuality.com>
- Date: Tue, 12 Jul 2016 14:53:48 -0700
- To: "Adrien W. de Croy" <adrien@qbik.com>
- Cc: ietf-http-wg@w3.org
- Message-ID: <CAHBU6isPH3vu7Cq6dTOV-f0kb2g_Oc+iXi4kX1m7JRceWxiRVw@mail.gmail.com>
I've written two large-scale web crawlers, processing billions of links, and since my principle was to err on the side of inclusiveness, I totally ignored Content-length and wrote the necessary code to deal with whatever I got, including the occasional GET returning an infinite stream of smiley-emoji or null bytes or whatever. On Jul 12, 2016 6:36 AM, "Adrien de Croy" <adrien@qbik.com> wrote: > Hi all > > just dealing with a site that sends more payload data than is indicated in > the Content-Length header. > > If the browser connects directly, the page loads fine, if via the proxy, > the proxy is truncating the length to that advertised and the client isn't > displaying a page (of course this is the .css file). > > RFC7230 sections 3.3.2 (Content-Length), 3.3.3 (Message body length), > and 3.3.4 (Handling incomplete messages) only contemplate issues around > Content-Length specifying more bytes than are received, not fewer. > > I guess one could argue that a wrong C-L value is "invalid", but it's not > clear that invalid in this context simply means it doesn't parse, or is > otherwise non-compliant with the ABNF. > > So, it's not clear what the browser and/or proxy response should be. If > we deem a wrong value to be "invalid" (s3.3.3 para 4), a client is supposed > to discard the response. This isn't happening. > > For the proxy, it only sees that the content length is wrong once it > receives too many bytes. By this stage, the horse has bolted so it cannot > really comply either. > > I would expect it's in everyone's best interest if sites that have broken > framing are forced to be fixed. This won't happen if browsers "just work" > for the site. > > Is there a special behaviour we should agree on for such cases? > > Regards > > Adrien de Croy >
Received on Tuesday, 12 July 2016 21:54:17 UTC