Re: Variant-ID proposal

Since Jim wants me to nail down as much as possible of the caching
stuff TODAY, and I need to have something concrete about variant-IDs
to do this, I'm going follow the plan described in this message.
This is not subject to debate today.  Once Jim issues his draft,
then anyone who wants to can reopen the discussion.

I'm happy to accept suggestions for minor corrections, but I don't
have time to get drawn into philosophical arguments.

First, Koen started this thread with the following statement:

    I have argued before that the requirement 
    
      Varying resources (even those that are transparently negotiated) MUST
      send responses which include variant-IDs.
    
    which is present in both Jeff's If-Valid/If-Invalid/Cval text and Roy's
    competing If-EID/Unless-EID/EID text must be dropped, because
    this requirement means unnecessary trouble for transparent content
    negotiation.

I cannot find this "requirement" in my drafts, and cannot recall having
made it.  I've always considered variant-IDs optional for the server,
since they are meant solely for improving performance.

Roy (or perhaps someone before him) defined two basic kinds of
negotiation:
	Preemptive negotiation, in which the client expresses
	preferences in a request on a resource, and the origin server
	chooses the most appropriate entity and returns that

	Reactive negotiation, which is signalled somehow by the
	client [draft-ietf-http-v11-spec-01.txt says by an empty
	Accept header, but this may have changed].  The origin server
	in this case returns a description of the choices, and
	then the client chooses one and makes a specific request
	for that entity.
We can consider these as two different ways to select the right variant:
"origin server selects" and "end-user client selects".  Or, in another
terminology, we could say that in preemptive negotiation, the server
is the "selecting participant" and the client is a "non-selecting
participant"; in reactive negotiation, the client is the
"selecting participant".

In a world without caches or proxies, that's it.

Now we introduce *non-caching* proxies.  I'll start by thinking of a model
with a single proxy (i.e., not local to the client or origin
server).  As far as I can tell, there are two ways that this
proxy can participate in the selection process:

	selection-transparent: the proxy does not participate
	in the selection process; it simply ships requests
	and responses back and forth between client and origin server.

	selecting-participant: the proxy uses reactive negotiation
	with the origin server, and preemptive negotiation with
	its client.  That is, the selection point is moved part-way
	from the origin server to the end-user client.

One might be tempted to apply some sort of symmetry argument that
says that a proxy could engage in preemptive negotiation with
the origin server and reactive negotiation with the client, but
I think this is a false symmetry: the proxy in this case would not
necessarily have enough information about the resource to allow
the end-user client to do reactive negotiation.

In other words, the point at which the selection is done (origin
server, proxy, or client) must have full information about the
available choices.

My belief is that Koen intends this information to be conveyed
by the Alternates header.  I'm basing this belief on the
draft-holtman-http-negotiation-00.txt document; I'm not sure
if any of his more recent messages have changed this.

Now we introduce caching.  Because caches store copies of
specific entity instances, not of resources, a cache needs
to use a specific entity identifier as a cache key; a URI
for a varying resource is not a sufficient cache key.  We
could construct an entity (variant) identifier in one of several
ways:
	(1) a URI that is bound to a specific entity (variant).
	(2) a URI bound to the varying resource, plus a set of
	selection criteria that is guaranteed to completely
	determine the variant.
	(3) a URI bound to the varying resource, plus an
	opaque variant ID.

#1 is conveyed by the Alternates header, apparently, so it is
available to (and usable by) the selecting participant.  However,
a client doing preemptive negotiation does not have the specific
URI, so a cache located at such a client has to use some other means
of identifying the variant (i.e., either #2 or #3).

#2 is, as far as I can tell, what Koen was trying to propose with
his "structured variant IDs."  There seems to be a generally
negative reaction to this; people don't want to insist that proxy
caches understand enough of the selection algorithm to make this
the main mechanism in HTTP/1.1.  (I think most people are willing
to believe that something like this could be made optional.)
There's also the problem that if the origin server uses a selection
criterion that cannot be expressed using the Alternates header
(e.g., "the user's birthdate is a prime number"), then this simply
doesn't work.

#3 seems to work pretty well.  In this approach, the origin server
MAY (not MUST) provide a variant-ID with any entity-instance that
it returns in a response.  If a cache receives a variant-ID, it
can do two things with it:

	(1) It can use it to replace an existing cache entry for
	the same variant.  That is, it forms a cache key using
	the URI of the request and the variant-ID of the response.
	If this key matches the key of an existing cache entry,
	it can replace the existing entry with the new response
	(subject to all of the other rules on caching).

	(2) It can use it, together with a cache validator, in a
	conditional request to inform the server that it already
	has the associated entity-instance in its cache.  The triple
	(URI, variant-ID, cache-validator) forms an identifier for
	a specific entity-instance (or set of instances, if the
	validator is weak).  This allows the server to return
	304 (Not Modified) knowing that the cache will understand
	which variant is being referred to.
	
	Note that this mechanism is entirely orthogonal to the
	selection process.  That is, the variant-selection process
	does not use the variant-ID information; it is only used
	after the selecting participant decides on the appropriate
	variant, and then needs to know if entity body should be
	transferred or not.
	
	Since the cache in this case may not know which variant
	is going to be selected, it should send all of its
	(variant-ID, validator) pairs for the resource, and let
	the selecting participant choose the right one.  This is
	what the variant-set mechanism is used for.  (Note that
	this adds some interesting flexibility to the variant
	selection algorithm; the selecting participant knows what
	is in the cache, and if there is no overriding selection
	criterion, it might "choose" a variant that is already
	in the requestor's cache, rather than an equally useful
	one that is not in that cache.)
	
If the origin server does not want to provide variant-IDs, it
does not have to.  However, in this case it becomes extremely
hard (may impossible) for a non-selecting participant to do
conditional retrievals, because it can't tell the selecting
participant the precise criteria that led to the creation
of the cache entry.

For the time being (i.e., in HTTP/1.1), only the origin server can
assign variant-IDs, because otherwise we have no way to prevent
two selecting-participant caches from assigning the same variant-ID
to two different variants of a resource.  (I think this was another
purpose of Koen's structured variant-IDs; if the caches have full
knowledge of the selection criteria, they could assign non-conflicting
variant IDs by a canonical representation of the criteria used.)

However, Koen might like this compromise: we allow (but do not require)
the origin server to embed variant-identification information
in the opaque validator itself.  (We do NOT allow the cache
to look at this embedded information!)  Such a validator is
marked with the suffix "/S" (for "Selecting").  Then if a cache has
received a variant with an opaque validator but without a variant-ID,
it can still perform a conditional retrieval on the resource.
However, the origin server will only provide a 304 (Not Modified)
response if it is using this kind of opaque validator; otherwise,
it must treat the request as unconditional.  And an intermediate
cache can respond to this kind of conditional request (one without
a variant-ID) if it has this kind of "Selecting" validator, and
if it exactly matches the validator of one of the cache's entries
for the resource.

Stay tuned for a draft from Jim (once he gets one from me ...)

-Jeff

Received on Tuesday, 16 April 1996 23:13:13 UTC