Other model of how caching of generic resources works

The current 03 draft has the following model how a generic resource
`contains' its multiple representations:

     A generic resource (one subject to content negotiation) may be
     bound to more than one entity. Each of these entities is called a
     "variant" of the resource.

 (quoted from Section 16.5)

This model necessitated the introduction of the `resource entity'
concept:

 resource entity
  A specific representation, rendition, encoding, or presentation of a
  network data object or service, either a plain resource or a specific
  member of a generic resource.  A resource entity might be identified
  by a URI, or by the combination of a URI and a variant-ID, or by the
  combination of a URI and some other mechanism. An plain resource MUST
  be bound to a single resource entity at any instant in time.

We needed resource entities because _the resource entity is the unit
of caching, expiration, and revalidation_.  `cache slots' are assigned
to resource entities.  The sequence of responses from a plain resource
has the same caching rules associated with it as do the sequences of
responses from the different variants of a generic resource.

  ---------

In Paris, the editorial group discussed possible ways to get rid
of this `resource entity' concept in order to simplify the draft.

Below, I will outline a way of getting rid of it that does not require
any changes to the mechanisms defined in the spec, only to the
language used to define these mechanisms.  This model, by the way,
closely resembles the model in the content negotiation draft.

Core of the model
-----------------

The basic underlying idea is that a resource, if it binds to entities,
can bind to only one entity at a time.  More specifically

 - a plain resource, if it can generate 200 responses, binds to
   exactly one entity at every point in time.

 - a generic resource binds to no entities at all. In stead, it binds
   to multiple plain resources which in turn bind to entities.

The picture below is an example.  Here, each arrow represents a `binds
to' relation:

                         ----->  plain resource ---------> entity 1
                       /      
                      /
    generic resource   ------->  plain resource ---------> entity 2
    http://x.org/paper\
                       \
                         ----->  plain resource ---------> entity 3

Note that only the generic resource above is identified by a URI, the
plain resources the picture above do not have their own URIs.  The 1.1
draft says:

resource
  A network data object or service that can be identified by a URI
                                        ^^^ 
  (section 7.2).  At any point in time, a resource may be either a
  plain resource, which corresponds to only one possible
  representation, or a generic resource.

so there can exist resources which are _not_ uniquely identified by a
URI.  This is the loophole which allows the model to work.

In this model, a generic resource is a `portal' through which variant
resources are accessed.

When a generic resource is accessed the model of `what happens' is
as follows:
     1) a request on generic resource is received
     2) using the request, the server chooses one of the (plain)
        variant resources bound to the generic resource
     3) the server internally redirects the request to the chosen
        variant resource, i.e. it generates a response message as if
        the request was done directly on the variant resource
     4) the server _may_ add a variant-ID, which identifies the chosen
        variant resource, to the response message from step 3)
     5) the response message is sent it to the client.

If a cache receives a request on a generic resource, it will have to
either reproduce the five steps above, in particular step 2, or
forward the request (possibly with an If-NoMatch header) towards the
origin server.

The important thing to note here is that the variant resources are
plain resources, so in this model, _the plain resource is the unit of
caching, expiration, and revalidation_.  This eliminates the need to
talk about resource entities, which are the unit of caching,
expiration, and revalidation in the 03 draft.

So how do variant-IDs fit in?
-----------------------------

(Note that the semantics for variant-IDs described below are
_identical_ to those defined now in the 03 draft.  Only the words of
the description differ.)

Though variant resources are not identified uniquely by a URI, the
service author _may_ use variant-IDs to give each of the variant
resources a unique identifier, being the tuple
(request-URI,variant-ID):

                         ----->  plain resource ---------> entity 1
                       /        (http://x.org/paper,"en")
                      /
    generic resource   ------->  plain resource ---------> entity 2
    http://x.org/paper\         (http://x.org/paper,"fr")
                       \
                         ----->  plain resource ---------> entity 3
                                (http://x.org/paper,"ps.en")

This unique identification of variant resources has two advantages:

 - it allows the use of the If-NoMatch header by a cache to optimize
   access to the generic resource
 - it allows cache memory management to be more efficient

Note that variant-IDs are thus only an efficiency device, they are not
needed for correctness.  But caches themselves are also nothing more
than efficiency devices, so this is nothing new.

A lazy server may choose not to generate variant-IDs, in which case
there is only a many-to-one mapping from request headers sequences to
variant resources:

                         ----->  plain resource ---------> entity 1
                       /        (http://x.org/paper,req-headers-xyz)
                      /         (http://x.org/paper,req-headers-pyz)
                     /
    generic resource   ------->  plain resource ---------> entity 2
    http://x.org/paper\         (http://x.org/paper,req-headers-pqr)
                       \
                         ----->  plain resource ---------> entity 3
                                (http://x.org/paper,req-headers-abz)
                                (http://x.org/paper,req-headers-pbz)

Finally, there could be variant-IDs for only _some_ of the variant
resources:

                         ----->  plain resource ---------> entity 1
                       /        (http://x.org/paper,"en")
                      /         
    generic resource   ------->  plain resource ---------> entity 2
    http://x.org/paper\         (http://x.org/paper,req-headers-pqr)
                       \
                         ----->  plain resource ---------> entity 3
                                (http://x.org/paper,req-headers-abz)
                                (http://x.org/paper,req-headers-pbz)

So how do Content-Location headers fit in?
------------------------------------------

If a response from a generic resource contains a Content-Location
header, this can be seen as a statement by the author of the generic
resource that the chosen variant resource has an URI that uniquely
identifies it.

For example, the response 

   HTTP/1.1 200 OK
   ETag: "3420";"en"
   Content-Location: paper.en.html
   Content-Language: en
   ....

evokes the following image:

                         ----->  plain resource ---------> entity 1
                       /        (http://x.org/paper,"en")
                      /         http://x.org/paper.en.html
    generic resource  
    http://x.org/paper

But note that, at least under plain 1.1, the cache is _not_ allowed to
just serve entity 1 (if still fresh) if a request on
http://x.org/paper.en.html is made, because this would allow spoofing.
The Content-Location header has to be treated as purely informational.
It is intended that the http-wg will discuss, after the 1.1 draft is
out, appropriate restrictions under which a cache _can_ serve entity 1
if a request on http://x.org/paper.en.html is made.

Some random remarks
-------------------

  - whether a resource is generic or plain is a binary property. A
    resource may change from being generic to plain, and the other way
    around, at any point in time.    All variant resources bound to a
    generic resource must be plain.

  - Renaming `generic resources' to `negotiated resources' is
    considered to be a good idea by some.

  - Renaming `entity tags' to `entity identifiers' is considered to be
    a good idea by some.

If people like this model, I am willing to draft language for the 04
spec.

Koen.

Received on Thursday, 16 May 1996 16:24:28 UTC