- From: Stefan Eissing <stefan.eissing@greenbytes.de>
- Date: Thu, 4 Feb 2021 10:30:40 +0100
- To: Willy Tarreau <w@1wt.eu>
- Cc: Ryan Sleevi <ryan-ietf@sleevi.com>, Martin Thomson <mt@lowentropy.net>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
I so far like the proposal from Willy the most. Especially as it has defined semantics in presence of unaware, merging intermediates. As to the "real life" use of trailers: in apache httpd I only ever saw issues raised with gRPC. FTR: adding optional bodies to 1xx responses would break all sort of things. Not an option. Cheers, Stefan > Am 03.02.2021 um 20:50 schrieb Willy Tarreau <w@1wt.eu>: > > On Wed, Feb 03, 2021 at 02:00:23PM -0500, Ryan Sleevi wrote: >>>> I do have to agree with PHK here: this sort of merging is a state machine >>>> security nightmare, especially when thinking about the interaction with >>>> resources, caching, and the overall semantics of HTTP. >>> >>> For me, the main ennemy of trailers is the fact that they were considered >>> as the part of the same namespace as headers, which is what causes this >>> merging nightmare. But just like some fields are forbidden in trailers, we >>> could state that they ought to be ignored in headers (in case of merging >>> by an intermediary), and that's why I really do not want to see both use >>> the same name. Once you stick to this, there's no merging nightmare nor >>> security issue anymore: if the trailer is present, you know what to cache. >>> If it's absent (and being in the header is counted as absent), you don't >>> cache, period. It's only suboptimal. But suboptimality is what encourages >>> improvements in products. Breakage encourages fragmentation. >>> >> >> I'm not sure this is entirely correct, in the example you describe below >> (and more response there). That is, I think you've described a scenario >> where they're still part of the same logical namespace, in practice, even >> if they're not meant to be. > > They are by definition from the spec (see 7230#4.1.3 which even explains > how to merge them). What I mean is that by declaring the field's name and > its behavior in both parts we avoid any type of confusion for those who > don't know it and those who receive it molested. > >>> Then there are those which choke on trailers because they stop after 0 >>> CRLF, >>> they wouldn't cache either. What I like with this approach is that a >>> degraded >>> message cannot be restored later in the chain to become correct again. This >>> wouldn't be the case if using the same field name in both parts. >>> >> >> From an implementation state machine complexity, this ends up still being >> messy. For example, in Chrome at least, we use a multi-process architecture >> such that there is a network process, a browser process, and a renderer >> process. The renderer process requests a resource from the network process, >> and that result is then sent back via IPC through a fixed-size circular >> buffer. > > OK. > >> Using your example, the header says "don't cache", the trailer says "do >> cache". Currently, Chrome's implementation uses the header-defined value to >> determine whether it needs to "tee" (in the *nix sense) the response to >> both the disk cache and the renderer process, or whether to send it >> straight through to the renderer process. In order to ensure backpressure >> is properly handled, if the disk IO of the cache activity slows, it >> naturally slows the transmission rate to the renderer process. In effect, >> we only have a fixed amount of memory in use by a resource at a time. > > This makes sense. > >> Using your example, the need to cache after the fact in the trailer >> wouldn't be possible, not without buffering the entire response either in >> memory or through some temporary file, in order to know what to do with the >> response. > > But why ? I guess the vast majority of responses received by the browser > are currently cacheable, so how this one would make any difference ? The > condition to decide to tee would just differ, you'd tee if cache-control > says either "public, "private" or "trailers" for example, and never for > other responses like today. And when doing "trailers", you'd finally > decide to either commit the object at the end of the transfer or destroy > it. > > Note that I'm not at all trying to infer how this works in details, and > I know we all have our technical limitations for historic reasons or > anthing, but I don't see how an object fetched with a conditional cache > statement would require more storage or disk bandwidth than one fetched > with an explicit or implicit cache statement. You could see it as a final > message saying "finally I'd prefer if you dropped what you've just fetched > from your cache" after a usual response. > >> The same challenge applies in the inverse (where the header says >> cache and the trailer says don't cache), > > We must not do this, because that would mean that non-compliant > intermediaries will be fooled. > >> These aren't just "correctness" issues, but we view them as security >> relevant, because they can allow for things like denial of service (via >> resource exhaustion), > > Here I don't see how, as you don't cache more than what you normally > receive. You just know at the end of the transfer if it's worth keeping > what you just stored in your cache. > >> From a server security standpoint, a number of services >> draw security boundaries between "headers" (which should be >> trusted/controlled by server admin) and "bodies" (which can be controlled >> by hosted code/untrusted parties), and the introduction of "trailers" and >> any semantics would have to try to preserve/respect those assumptions. > > This remains true. Everything is in the header, only the final verdict > may happen in the trailer. Don't see it as an order for doing something, > rather as an opportunity. The Cache-Control header would say "I consider > it's worth caching this object, but if I encounter any issue, I'll tell > you at the end so that you can freely drop it". The other trailer says > "Object assembly went well enough, according to what I announced in > cache-control, I consider it worth keeping it in your cache if you want". > > If you fiddle with cache-control in the middle, that's enough to void the > trailer's effect, so headers clearly are the ones defining everything > related to the semantics here. > >> That >> is, trailers that redefine the semantics of headers, even if separately >> named, would and could be abused, which equally makes trailer bits >> unexciting. > > This is also why I don't want to use similar header names on the two sides. > >> I agree it's better for intermediates to use the semantics you described. >> That feels similar to the set of mitigations adopted by HSTS/HPKP - namely, >> that they only accepted the 'first' occurrence of a header and relied on >> the headers not being mergeable, in order to try to preserve the separation >> I mentioned above. But it doesn't feel terribly exciting from an >> implementation perspective, and feels like it could easily lead to new >> classes of security bugs if generally available. > > I hardly see which ones if the header defines the rules and the trailer > only provides an operational status to let the client act as it prefers. > > Regards, > Willy >
Received on Thursday, 4 February 2021 09:30:58 UTC