Re: Why should caches and intermediaries ignore If-Match?

Here is the use case:

We have a content-optimization (compression) proxy sitting between the
browser and origin server. Among other things, the proxy can compress
videos. When the browser starts playing a video, it makes an initial HTTP
request to fetch (part of) the video, then builds an in-memory
representation of the video and uses additional HTTP range requests as
needed to fetch the rest of the video. For example, range requests are used
to implement seeking.

The challenge is that we now have multiple representations of every video:
the original representation (from the origin server) and one or more
compressed representations served by the proxy. When the browser makes an
initial request for a video, it gets one of these representations. When it
makes a subsequent range request, we want to ensure that it receives the
*same* representation that it received on the initial request. Otherwise
the browser cannot combine the second response with the first response and
video playback will fail.

An additional challenge is that the browser and proxy both have a cache. In
theory, we control the entire connection and could add custom code to the
browser, proxy, and caches to implement any protocol that we invent. In
practice, both caches are intended to be HTTP-compliant caches and we'd
rather not add custom hacks for use cases like this if we can avoid it.

The browser needs to label each range request with the ETag it expects to
receive. If-Match originally seemed like the perfect solution: The browser
adds `If-Match: ETag` to every range request. If a cache has a copy of the
video with a *different* ETag, the cache forwards the request to the next
server in the chain rather than returning its cached copy (as would happen
if we used If-Range instead of If-Match). Similarly, the proxy knows if the
browser is requesting a compressed video or the original video, so it can
respond accordingly. However, as discussed previously in this thread,
If-Match doesn't work like this.

Note that I agree it doesn't make sense for a cache to return 412 and we
don't need that behavior. The semantics I'm looking for is: "Send me this
representation if you have it, otherwise forward to the next server. A 4xx
means that this representation is not current in the origin or in any
intermediate cache or proxy."

Hope that makes sense.

On Mon, Feb 27, 2017 at 5:03 PM, Roy T. Fielding <fielding@gbiv.com> wrote:

> > On Feb 26, 2017, at 3:49 PM, Mark Nottingham <mnot@mnot.net> wrote:
> >
> > I think the best way to characterise the situation currently is that
> HTTP doesn't define any requirements for If-Match on non-origin servers;
> the only requirements in 7232 Section 3.1 apply to origin servers.
> >
> > AFAIK current intermediaries ignore If-Match, so if you wanted to define
> some guidelines here, they'd need to be completely optional. E.g., "An
> intermediary MAY process If-Match based upon the contents of its cache,
> replying with 4xx when..." (note that that's just rapid hand-waving, not
> suggested spec text).
> >
> > If we did that, we'd have a header whose handling by origin servers was
> mandatory for some methods, and handling by intermediary servers was
> optional for other methods. Not sure how much that would confuse people,
> but properly spec'd, it'd probably be OK.
> >
> > We'd also have to have a discussion about whether 412 was the right
> status code.
> >
> > Roy, any thoughts?
> >
> > Tom, can you say any more about your use case?
>
> My thoughts would probably depend on the use case.
> Note that the HTTP spec is only defining rules for communication
> between independent components.  Although the internal architecture of
> a user agent might include something like an HTTP cache, HTTP's rules
> do not limit communication between the UA and its own internal cache.
> As far as HTTP is concerned, they are both part of the user agent.
>
> Thus, the sentence in 3.1:
>
>    It can also be used with safe
>    methods to abort a request if the selected representation does not
>    match one already stored (or partially stored) from a prior request.
>
> is referring to one already stored on the user agent from a prior
> request by that user agent.
>
> Originally, If-Match was defined to be answerable by intermediaries
> for GET/HEAD requests.  However, 412 was considered by the WG to be an
> undesirable response in those cases, so If-Range was created to
> replace that function. My guess is that's the use case here. OTOH,
> a 412 might be preferred for safe methods other than GET and HEAD.
>
> AFAICR, limiting If-Match requirements to origin servers in RFC7232
> was due to lack of implementation by clients (aside from the unsafe
> methods) and a desire for semantic consistency for the field.
>
> For unsafe methods, the client's field value is referring to the
> current selected representation on the origin server, which is something
> that can only be tested by the origin server. Having a special-case for
> safe methods meant that both the meaning of the field changed per
> method and the need to implement it changed per method, which is quite
> a bit of complexity for a feature that nobody ever used.
>
> BTW, Apache httpd implements If-Match in the default resource handler
> and anywhere that calls ap_meets_conditions().  That will result in a
> 412 response to an otherwise successful request if the etag given
> doesn't match the selected representation, regardless of the method.
> [I haven't tested it to see if that gets called by default when the
> server is installed as an intermediary.]
>
> Cheers,
>
> ....Roy
>
>

Received on Thursday, 2 March 2017 02:21:33 UTC