Re: Cache control in trailers?

In addition to the state machine issues Ryan points out, I don't think the
original premise in focusing on just the cache's behavior holds.

Suppose a server has encountered an unrecoverable error generating a
resource. The cache must know for future requests, but the current request
also must know. Truncated and complete bodies are different. Truncated
downloads may be communicated to the user as failed. Truncated script and
style resources do not execute because doing so has security consequences.

Resetting the stream sends *exactly* the signal you need. Doing anything
else in the HTTP frontend is a bug, likely with security consequences.
Likewise, an HTTP cache failing to notice the reset is a bug, likely with
security consequences. If some HTTP cache is broken here, it'll need a code
change for Cache-Control trailers anyway. That code change is better spent
on fixing this bug.

On Wed, Feb 3, 2021 at 5:29 PM Ryan Sleevi <ryan-ietf@sleevi.com> wrote:

>
>
> On Wed, Feb 3, 2021 at 2:50 PM Willy Tarreau <w@1wt.eu> wrote:
>
>> On Wed, Feb 03, 2021 at 02:00:23PM -0500, Ryan Sleevi wrote:
>> <snip>
>> > Using your example, the need to cache after the fact in the trailer
>> > wouldn't be possible, not without buffering the entire response either
>> in
>> > memory or through some temporary file, in order to know what to do with
>> the
>> > response.
>>
>> But why ? I guess the vast majority of responses received by the browser
>> are currently cacheable, so how this one would make any difference ? The
>> condition to decide to tee would just differ, you'd tee if cache-control
>> says either "public, "private" or "trailers" for example, and never for
>> other responses like today. And when doing "trailers", you'd finally
>> decide to either commit the object at the end of the transfer or destroy
>> it.
>>
>> Note that I'm not at all trying to infer how this works in details, and
>> I know we all have our technical limitations for historic reasons or
>> anthing, but I don't see how an object fetched with a conditional cache
>> statement would require more storage or disk bandwidth than one fetched
>> with an explicit or implicit cache statement. You could see it as a final
>> message saying "finally I'd prefer if you dropped what you've just fetched
>> from your cache" after a usual response.
>>
>
> The "commit the object at the end or destroy it" is precisely the issue,
> however. It's a new edge in the state machine, in which an object in the
> cache could be not-yet-committed, which then makes a number of assumptions
> potentially problematic. I agree that there are many ways to potentially
> tackle this, but I highlight it as an example of why it's not very
> exciting, and thus unlikely to be implemented (just as trailers are not
> supported in any of our layers beyond basic H/2 message framing, at which
> point, they get dropped).
>
> For example, consider a situation where there are multiple requests for
> the same resource. Request 1 comes in, opens up the resource, begins
> reading the resource while streaming the response into the file and to the
> caller. Request 2 for that resource comes in, sees that it's available in
> the cache (even though not fully completed), begins reading that response
> and streaming to the other caller. In the semantics described, the need for
> an explicit Commit stage prohibits this, in as much as it's not clear that
> Request 2 can reuse that resource, because Request 1 might change its mind
> at the end of the stream.
>
> This is akin to the problem others have raised, with respect to policy
> controls applied early-on/up-front.
>
> Equally, consider a situation where there is a pre-existing resource in
> the cache, and during opportunistic revalidation, a newer version is
> available. If that Request decides at the end to not cache, we've now
> doomed an otherwise usable resource, and need to commit to removing it from
> the cache.
>
>
>> > These aren't just "correctness" issues, but we view them as security
>> > relevant, because they can allow for things like denial of service (via
>> > resource exhaustion),
>>
>> Here I don't see how, as you don't cache more than what you normally
>> receive. You just know at the end of the transfer if it's worth keeping
>> what you just stored in your cache.
>>
>
> Sorry, the resource exhaustion remark was with respect to internally
> buffering the resource to determine whether or not to tee. While the
> analogy isn't perfect, the situation is you can't tee > /dev/null and then
> expect to copy that data back out to a file. And if you tee to a file, and
> then have to move it to /dev/null at the end, you've equally consumed
> resources in the intermediate steps (such as a set-aside pre-commit file).
> Important policy related checks (such as total storage for an origin, and
> invalidating entries to ensure sufficient space based on content-length)
> now run at the end, after the resources have been committed, rather than
> before.
>
> Again, I don't doubt that there can and are possibilities of using this in
> interesting ways. But the complexity trade-off is not particularly
> compelling or exciting, both in terms of practical implementation, but then
> also reasoning about protocol state machines, which is why we haven't
> implemented/exposed trailers beyond process-frame-and-throwaway. I know for
> certain it's made a number of folks sad, including the fine folks at
> Fastly, because it does limit some cases.
>
> And this is, of course, ignoring that for HTTP/1.1 we more or less
> consider this doomed on the client because of how many broken
> intermediaries they have to deal with :) H/2+ using encrypted connections
> at least have bought us a little more flexibility here, and even though
> there are still bad implementations of those, they're at least newer
> implementations, so the mistakes aren't yet as firmly ossified or unnoticed
> :)
>

Received on Wednesday, 3 February 2021 22:59:25 UTC