Re: Cache control in trailers?

On Wed, Feb 03, 2021 at 12:38:22PM -0500, Ryan Sleevi wrote:
> On Wed, Feb 3, 2021 at 10:13 AM Willy Tarreau <w@1wt.eu> wrote:
> 
> > On Wed, Feb 03, 2021 at 02:40:27PM +0000, Poul-Henning Kamp wrote:
> > > Except it doesn't work, because nobody implements trailers - because they
> > > dont want to deal with the complexity of merging headers and trailers.
> >
> > But why do you say "nobody" after I gave an example (gRPC). You don't
> > necessarily *need* to merge them, just like some headers are forbidden
> > in trailers, we can very well specify that some fields are only relevant
> > in trailers. FWIW apparently gRPC does this as I seem to have read
> > somewhere that a grpc-status trailer is mandatory and that as a corollary,
> > you may never see an H2 data frame carrying the ES flag.
> 
> 
> Do you have other examples?

Examples of what, trailers in real use ? I seldom see them in user reports.
The first time I encountered them was about 10 years ago, when haproxy was
not able to correctly forward them and caused trouble with a home-grown
enterprise application. I've seen one application emit signatures of
transfers and another one, maybe two, sending debugging information (timing,
request ID or call trace). I'm seeing here that Fastly makes use of them to
pass timing information:

  https://www.fastly.com/blog/supercharging-server-timing-http-trailers

based on this draft:

  https://www.w3.org/TR/server-timing/

But recently each time I heard about them it was about gRPC.

> To say gRPC holds HTTP "weird" is a bit of an understatement, in terms of
> resources and semantics. That is, while it overlays on the wire protocol,
> the semantics themselves of gRPC's implementation don't really align nicely
> with what folks might reasonably think about HTTP with respect to resources,
> methods, headers, etc.

To be honest, I don't know much of gRPC. I remember looking at the
requirements a while ago when implementing the server-side H2 part in
haproxy, making sure it was compatible (since it was essentially the
main requested use case), and that was all. Right now users appear fine
with it except for some random things like "the client complains about
missing trailer on an error response produced by the proxy" (obviously
since it's H2, we don't have to emit random trailers after an error),
and cancellation which is not easy to deal with.

> This is particularly important when considering that there really isn't
> anywhere near the level of intermediation for gRPC as there is for
> "HTTP"/"the Web". The intermediaries that do exist aren't agnostic; they
> understand the semantics of gRPC and are often gRPC implementations
> themselves.

It's inexact, with haproxy we're an HTTP proxy with H1 or H2 on each side,
and gRPC appears to pass correctly over it.

> I do have to agree with PHK here: this sort of merging is a state machine
> security nightmare, especially when thinking about the interaction with
> resources, caching, and the overall semantics of HTTP.

For me, the main ennemy of trailers is the fact that they were considered
as the part of the same namespace as headers, which is what causes this
merging nightmare. But just like some fields are forbidden in trailers, we
could state that they ought to be ignored in headers (in case of merging
by an intermediary), and that's why I really do not want to see both use
the same name. Once you stick to this, there's no merging nightmare nor
security issue anymore: if the trailer is present, you know what to cache.
If it's absent (and being in the header is counted as absent), you don't
cache, period. It's only suboptimal. But suboptimality is what encourages
improvements in products. Breakage encourages fragmentation.

> It's exactly this
> risk that has discouraged Chrome from exposing trailers, either in our
> internal APIs for dealing with resources or in web-developer facing APIs.
> Yes, we support it at the wireframe level in our H/2 implementation, but
> that's intentionally not surfaced beyond that.

Then this sounds like a clean approach to me.

> This does mean, for example,
> that developers working on Google Chrome are (intentionally) not able to
> use gRPC within Chrome: the complexity in implementation and security is
> presently not justified.

I agree.

> Admittedly, (DAY-JOB) is primarily dealing in edge cases and weird states
> in state machines, so I'm probably over indexed on those concerns,

We probably have comparable jobs then and I agree that this quickly develops
some immediate reflexes against certain patterns.

> but I
> wouldn't be terribly excited for trailers precisely because it necessitates
> careful re-review of every state machine, end to end, to make sure new
> issues and surprises aren't introduced by such semantics. So even if, in
> the abstract, it's good and useful, that sort of complexity may preclude
> implementation.

What class of issue would you envision with a field which only has semantics
in trailers and which must be ignored in headers ? I mean, say the server
emits this:

    HTTP/1.1 200 OK
    Transfer-encoding: chunked
    Cache-control: no-cache; trailers
    
    b
    0123456789
    0
    Cache-Post-Body-Status: public; max-age=86400

It could be relayed as-is by compliant intermediaries. It could be relayed
like this by those compliant as well but which merge trailers and headers:

    HTTP/1.1 200 OK
    Transfer-encoding: chunked
    Cache-control: no-cache; trailers
    Cache-Post-Body-Status: public; max-age=86400
    
    b
    0123456789
    0

In this case the response is not cached. Transfer-encoding could even be
translated to content-length by the way, the principle remains. The trailer
could also be silently dropped on the path:

    HTTP/1.1 200 OK
    Transfer-encoding: chunked
    Cache-control: no-cache; trailers
    
    b
    0123456789
    0

It wouldn't be cached either.

Then there are those which choke on trailers because they stop after 0 CRLF,
they wouldn't cache either. What I like with this approach is that a degraded
message cannot be restored later in the chain to become correct again. This
wouldn't be the case if using the same field name in both parts.

On the opposite, adding an optional body after 1xx would make cache pollution
attacks trivial through plenty of components. Just upload an image containing
a dummy HTTP response to a server, download it again, and watch intermediaries
stop after the 1xx and parse the body as the final HTTP response, taking it
for the valid contents.

I think the approach is elegant to a certain extent, because it reuses some
protocol elements that have not changed in two decades. That doesn't mean
they're well implemented everywhere, but that their failure modes are more
or less known by now.

Willy

Received on Wednesday, 3 February 2021 18:17:57 UTC