Re: Variant IDs from Jeffrey Mogul on 1996-02-05 (http-caching-historical@w3.org from February 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Mon, 05 Feb 96 14:02:31 PST
To: http-caching@pa.dec.com
Message-Id: <9602052202.AA08889@acetes.pa.dec.com>
The concept of variant IDs deserves a full description as soon as
possible, and Paul Leach and I agreed to provide one.  It would be
best if we do not get into a detailed discussion of the whole concept
before Paul and I have a chance to write it down, but I thought it
would be a good idea to summarize the issue and the proposed solution.

Consider a single resource with multiple variants (e.g., a document that
has been translated into several dozen languages).  Consider a proxy which
already has several variants in its cache (for concreteness, let's
say it has three such variants: English, French, and German).  Let's
also assume that the server sent "Vary: Accept-Language" because it
didn't want to encode all of the possible variants in a URI: header.

So now the cache gets a request for this resource that says:

	Accept-language: da, en;q=0.5, de;q=0.3

which (if I understand correctly) means that the user prefers Danish
but is willing to accept English or maybe German.

Since the server has not told the cache whether or not it has a
Danish variant of the resource, the cache has to forward the request
to the origin server.  Here's the problem: when the origin server
receives this request, it has no idea which variants the cache
is currently holding.  This means that even if the cache currently
holds exactly the response that the server would provide, the server
has to retransmit it to the cache.

This is what variant IDs are supposed to solve.  Suppose that
the origin server tags each response with a Variant-ID: header.
E.g.,
	Variant-ID: xy
or
	Variant-ID: 97
The value of the Variant-ID: field is meant to be opaque and
relatively compact (i.e., it should not take a lot of bytes to
transmit it).

So suppose that our cache holds these entries for the resource R
	R1: (Content-Language: en, Validator: zzzzzz, Variant-ID: 1)
	R2: (Content-Language: fr, Validator: qqqqqq, Variant-ID: 3)
	R3: (Content-Language: de, Validator: xxxxxx, Variant-ID: 97)

Now when it is time for the cache to forward the request to the origin
server, it tacks on this new header:

	Variant-set: id=1;zzzzzz, id=3;qqqqqq, id=97;xxxxxx

That is, the set of the variants it currently holds and their
associated validators.

The server goes through its normal content-negotiation algorithm
to decide which variant to return (i.e., for this purpose it
ignores the Variant-set: header).  Once it has made this decision,
it then checks to see if the variant it plans to return is in
the cache's variant set.

For this example, if the server DOES have a Danish variant, then it
would return a status code of "200 OK", headers including
	Content-language: da
	Variant-ID: 192
and the full entity body for the Danish variant.  If, on the other
hand, it does not have a Danish variant, it would presumably
want to return the English variant that the cache already knows
about.  In this case, the Variant-set in the request indicates
that the cache-validator for variant-ID 1 (which is the English
variant) is zzzzzz, so the origin server does its normal validator
check to see if this cached copy is still valid.  If so, it returns
a status code of "304 Not modified" plus these headers:
	Content-language: en
	Variant-ID: 1
and otherwise it returns the same headers but sends "200 OK" and
the entire entity body.

Since this is just a performance optimization, it does not matter
if either the cache or the origin server doesn't implement it.  I.e.,
neither "Variant-ID:" nor "Variant-set:" is mandatory.

-Jeff
Received on Monday, 5 February 1996 22:17:32 UTC