Re: Cache control in trailers? from Ryan Sleevi on 2021-02-03 (ietf-http-wg@w3.org from January to March 2021)

From: Ryan Sleevi <ryan-ietf@sleevi.com>
Date: Wed, 3 Feb 2021 17:26:21 -0500
To: Willy Tarreau <w@1wt.eu>
Cc: Ryan Sleevi <ryan-ietf@sleevi.com>, Martin Thomson <mt@lowentropy.net>, Poul-Henning Kamp <phk@phk.freebsd.dk>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <CAErg=HEujwP-bs_T1b2YOZQ1oOc3T10WXCa6oDcGsWjzvdyWyA@mail.gmail.com>
On Wed, Feb 3, 2021 at 2:50 PM Willy Tarreau <w@1wt.eu> wrote:

> On Wed, Feb 03, 2021 at 02:00:23PM -0500, Ryan Sleevi wrote:
> <snip>
> > Using your example, the need to cache after the fact in the trailer
> > wouldn't be possible, not without buffering the entire response either in
> > memory or through some temporary file, in order to know what to do with
> the
> > response.
>
> But why ? I guess the vast majority of responses received by the browser
> are currently cacheable, so how this one would make any difference ? The
> condition to decide to tee would just differ, you'd tee if cache-control
> says either "public, "private" or "trailers" for example, and never for
> other responses like today. And when doing "trailers", you'd finally
> decide to either commit the object at the end of the transfer or destroy
> it.
>
> Note that I'm not at all trying to infer how this works in details, and
> I know we all have our technical limitations for historic reasons or
> anthing, but I don't see how an object fetched with a conditional cache
> statement would require more storage or disk bandwidth than one fetched
> with an explicit or implicit cache statement. You could see it as a final
> message saying "finally I'd prefer if you dropped what you've just fetched
> from your cache" after a usual response.
>

The "commit the object at the end or destroy it" is precisely the issue,
however. It's a new edge in the state machine, in which an object in the
cache could be not-yet-committed, which then makes a number of assumptions
potentially problematic. I agree that there are many ways to potentially
tackle this, but I highlight it as an example of why it's not very
exciting, and thus unlikely to be implemented (just as trailers are not
supported in any of our layers beyond basic H/2 message framing, at which
point, they get dropped).

For example, consider a situation where there are multiple requests for the
same resource. Request 1 comes in, opens up the resource, begins reading
the resource while streaming the response into the file and to the caller.
Request 2 for that resource comes in, sees that it's available in the cache
(even though not fully completed), begins reading that response and
streaming to the other caller. In the semantics described, the need for an
explicit Commit stage prohibits this, in as much as it's not clear that
Request 2 can reuse that resource, because Request 1 might change its mind
at the end of the stream.

This is akin to the problem others have raised, with respect to policy
controls applied early-on/up-front.

Equally, consider a situation where there is a pre-existing resource in the
cache, and during opportunistic revalidation, a newer version is available.
If that Request decides at the end to not cache, we've now doomed an
otherwise usable resource, and need to commit to removing it from the cache.


> > These aren't just "correctness" issues, but we view them as security
> > relevant, because they can allow for things like denial of service (via
> > resource exhaustion),
>
> Here I don't see how, as you don't cache more than what you normally
> receive. You just know at the end of the transfer if it's worth keeping
> what you just stored in your cache.
>

Sorry, the resource exhaustion remark was with respect to internally
buffering the resource to determine whether or not to tee. While the
analogy isn't perfect, the situation is you can't tee > /dev/null and then
expect to copy that data back out to a file. And if you tee to a file, and
then have to move it to /dev/null at the end, you've equally consumed
resources in the intermediate steps (such as a set-aside pre-commit file).
Important policy related checks (such as total storage for an origin, and
invalidating entries to ensure sufficient space based on content-length)
now run at the end, after the resources have been committed, rather than
before.

Again, I don't doubt that there can and are possibilities of using this in
interesting ways. But the complexity trade-off is not particularly
compelling or exciting, both in terms of practical implementation, but then
also reasoning about protocol state machines, which is why we haven't
implemented/exposed trailers beyond process-frame-and-throwaway. I know for
certain it's made a number of folks sad, including the fine folks at
Fastly, because it does limit some cases.

And this is, of course, ignoring that for HTTP/1.1 we more or less consider
this doomed on the client because of how many broken intermediaries they
have to deal with :) H/2+ using encrypted connections at least have bought
us a little more flexibility here, and even though there are still bad
implementations of those, they're at least newer implementations, so the
mistakes aren't yet as firmly ossified or unnoticed :)
Received on Wednesday, 3 February 2021 22:26:47 UTC