Proposal: Vary on Cookies

I've been a lurker on the HTTP-WG list for about a year now, paying
particular attention to the HTTP State Management threads.  I know this
might be a bad time to propose an addition to the spec (considering the
turmoil it seems to be in now, and that it seems to have fled to a 
sub-group), but I believe this functionality might be worth adding.
If this functionality already exists, please let me (and the other
cache-conscious dynamic-content implementers out there) know.  If there
is a better forum for proposing this, let me know also.

Often times, cookies are used to describe a characteristic of a visitor. 
These characteristics might be:
  o Preferences the visitor has set earlier (e.g. Frames=no or
    LotsOfImages=yes or Bgcolor=blue)
  o Flags to mark the visitor (e.g. AnsweredQnr=yes; this person has
    already filled out our questionnaire so don't ask again)
  o The market segment of the visitor (and this will grow in
    importance as online commerce ramps up) (e.g. BuyACar=3; they're
    pretty likely to buy a car)

These types of cookies vary from visitor to visitor and content may be
negotiated based on these cookies (to give a page with a blue backround
or an ad with a car in it), but there are a (small) finite number of
values they may be given.

Because of this, it would be nice to be able to tell proxy-caches which
specific cookie the negotiated content is varying on.  The solution provided
by the HTTP/1.1 spec is to include:
    Vary: Cookie
in the response headers.  This is problematic because you could have
several other cookies (for instance, a user-id used in a cookie-based
authentication scheme) in addition to the one which actually "selected" the
content.  Use of different "path" parameters in the cookies can fix this
problem in some situations, but not all (not to mention the fact that it
can add significantly to the complexity of the application design).

So, right now, the only solution is to pre-expire all pages, requiring
proxy-caches to revalidate the page each time and, worse, preventing them
from storing variants even if there are only a few.  It would be nice if
we could extend the Vary syntax to allow headers of the form:
    Vary: Cookie.Frames, User-Agent
This would give us a nice way to "select" the content based on a part of
the header instead of the whole header.  This might turn out to be useful
for headers other than Cookie: as well (imagine Vary: Date.Wday).  However,
this might cause problems in the future because "." is allowed in tokens
which is what field-names are (and a future extension-header might contain
one, although no headers to-date do that I know of).

Using a tspecial to set off the part would fix that problem (i.e. 
Vary: Cookie:Frames, User-Agent), but, unfortunately, I don't
think the grammar allows us to do this in a backward-compatible way,
because the content of the Vary: header is supposed to be a comma-delimited
list of tokens (and that tspecial thrown in there might break some
implementations).

To side-step these problems for now, my proposal, which could be added to
the State Management proposed standard or written up as a separate Internet
draft (if procedure dictates), is to add an HTTP header, "Vary-Cookie".  The
syntax for Vary-Cookie is

vary-cookie        =       "Vary-Cookie" ":" cookie-selectors
cookie-selectors   =       1#NAME

where NAME is defined in RFC2109. The cookie-selectors are a comma-delimited
list of cookie names upon which the negotiated content was "selected". 
These NAME's might appear as the attribute in the av-pairs of the Cookie:
request header.  Origin servers which wish to indicate that content was
negotiated solely on the "Frames" cookie should send:
    Vary-Cookie: Frames
For backwards compatibility, origin servers sending the Vary-Cookie: header
MUST also send:
    Vary: *
or
    Vary: Cookie

Proxy-caches which do not understand the Vary-Cookie header would treat the
response as if it varied on something outside of its knowledge (if Vary:
*), in which case it would do a conditional GET on future requests, or on
the entire Cookie: header (if Vary: Cookie), in which case it could use the
cached response for requests whose cookies matched.  Although neither are
optimal, they will lead to "correct" behavior.

Proxy-caches which do understand the Vary-Cookie header would treat the
response as if it varied only on the value of the Frames cookie, do to the
presence of the Vary-Cookie: header.  If the server sends Vary: * and the
response was "selected" on something other than cookies, the origin server
MUST NOT send the Vary-Cookie: header.

To handle this comparison, the proxy-cache must be able to parse the Cookie:
header field into it's AV pairs and store the relevant cookie values as part
of the cached response.  NAME's should be compared case-insensitively (as
described in RFC2109) but the VALUE must be compared case-sensitively when
deciding to use the cached response.

Anyway, there are other issues to think about (such as cookies with the same
name but different paths or domains -- do we want to add $Domain and $Path
parameters as well -- we probably need to see how the State Management
proposal ends up) plus it needs ALOT of work and discussion, but I wanted
to get some feedback about the idea from the list because this is exactly 
the functionality I need for my application.  Plus, it would be a breeze to 
implement in CGI scripts or server plug-ins -- keep track of which cookies
you used to "select" the negotiated content and then just insert their 
names into the Vary-Cookie: header before you send back the content.

-rv

Richard Vermillion                                         212 255 6655 x106
Cyber Dialogue                                 rvermillion@cyberdialogue.com

Received on Thursday, 12 June 1997 16:10:59 UTC