Re: Session-ID proposal from Bob Wyman on 1995-08-08 (www-talk@w3.org from July to August 1995)

From: Bob Wyman <bobwyman@medio.com>
Date: Mon, 07 Aug 95 23:58:10 -0800
To: "dmk@allegra.att.com" <dmk@allegra.att.com>, "http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>, "www-talk@www10.w3.org" <www-talk@www10.w3.org>
Message-Id: <199508080706.AAA13709@dns.medio.com>
-- [ From: Bob Wyman * EMC.Ver #2.5.02 ] --

re:  Dave Kristol's Session-ID proposal

WHEN DOES  A SESSION END?

Your draft says that one of the "key points in the session paradigm" is that
"The session has a beginning and an end." However, in your draft, it isn't
apparent in all cases that both parties (or all parties if a proxies are
involved) can determine when a session ends.

You insist that "when the client terminates execution, it discards all
Session-ID information." This, I assume, "ends" the session from the point
of view of the client. However, the server will never discover that this has
happened.

Although you don't specify it clearly, the requirement that a client "must
return [the Session-ID] to the server for the next transaction by any method
..." might be read to imply that the client can end a session by making
another request without sending the Session-ID of the last response (if non-
conformance is an implicit "end".). This would require the server to
maintain a list of Session-ID's and the client hosts to which they were
assigned and then doing a lookup in this list on *every* request to
determine if the most recent request should have used a Session-ID. But,
that won't work for clients that are hidden behind proxies or that come from
multi-tasking machines since many clients may appear to be using the same
host name.

On the server side, you say that the server may send "...a different, or no
Session-ID response header" in response to a request which includes a
Session-ID. Although your draft doesn't explicitly say this, I assume you
intend that when the client receives a Session-ID response which is
different from that transmitted in the corresponding request, that the
client should assume that the old session has ended and a new session may be
starting. However, it is a method whose utility is limited to only those
times when the session's end coincides with an opportunity to send a
response.

The fear, of course, is that since there are a number of scenarios in which
the server can't discover that a session has ended, that the server will be
forced to build an ever-growing list of active sessions. This is not good.
One solution to this problem would be to provide for a session expiration
date (either absolute or relative to last response) which would give the
server a mechanism for purging it's Session-ID tables. (Note: Netscape's
"cookie" proposal does something similar although expiring a "cookie" is
somewhat different than expiring a Session-ID since one would assume that a
Session-ID has some level of uniqueness -- not addressed in your draft ---,
while there is no such requirement for a cookie.)

WHAT IS  A CLIENT?

As far as I know, there is no formal reference model for the Web, thus, it
is necessary from time to time to ask what people mean when talking about
specific architectural elements. Seeing your requirement that client
"discards all Session-ID information" when it terminates, and that the
"client" must send the Session-ID on the next request I'm worried about
imprecision in defining the client. For instance, if I have a Web browser
that allows me to have two open windows (i.e. Netscape), if I get a Session-
ID as the result of activity in window "A", am I required to send that
Session-ID with requests generated in window "B"? If I close window "A", do
I keep the Session-ID or delete it?

WHAT IS A SERVER?

Many of the demands for Session-ids or session-state have been intended to
allow CGI scripts to distinguish between clients and to maintain client-
specific state on the server side. This leads to the question (another
reference model problem): What is the server? Is the session with the HTTP
server or is it with the CGI script? Your draft indicates that you think the
server is identified by "server name (IP address) and port combination." It
seems to me that this means we're going to have to implement some
potentially complex method for letting CGI scripts know the session ids for
the clients they speak to. Choices seem to be: 1) a configuration parameter
that tells the server to always generate Session-ID's when a particular CGI
script is run. The Session-ID would then be given to the CGI in an
environment variable or some simlilar process. 2) An API by which the CGI
can ask the server to provide a Session-ID and tell the CGI script what it
is. Additionally, given that you suggest that Session-ID's can be changed or
eliminated by the server, we'll need a mechanism for CGI scripts and servers
to negotiate and/or inform each other of these changes.

Whatever the method of telling the CGI what the Session-ID is, unless the
specification states that Session-ID's have some sort of uniqueness to them,
they won't be useful for many of the purposes that CGI scripts would want to
use them.

My personal preference would be for the CGI to be able to generate its own
Session-ID which addresses whatever it thinks its operational requirements
are. Thus, I would argue that the CGI script is the "server", not the HTTP
daemon.

ARE WE DEFINING PROTOCOL OR USER INTERFACE?

Your requirement that the client discard information when it "terminates
execution" might be a good recommendation, however, it is inappropriate as a
*requirement* of the HTTP Protocol. The protocol should place no constraints
on program execution models -- only on what data flows over the wire. 

PROBLEMS WITH CACHING

You state that when a caching proxy gets a Session-ID response header, "it
must not cache that header as part of its cache state." However, you don't
prohibit the caching proxy from caching the body of the response. Thus, it
would appear that a second client could make a request and have that request
satisfied from cache without ever discovering that Session-ID's were
available for the document. This would be a particular problem when a
response came back with both a Session-ID and an Expires: header since the
cache might decide not to do a HEAD, GET, or conditional GET until after or
close to the Expire time. It would seem that the cache should remember that
the Session-ID response header was there (whether or not it caches the
actual Session-ID) and then always do a conditional GET for the document
even if the Expires: time hasn't passed. (NOTE: Of course, you could insist
that the semantics of Expires be changed if found coincident with a Session-
ID in a response.  -- the question is whether the "Expires" header is
globally meaningful or meaningful only within the context of the session)

WHAT PROBLEMS ARE BEING SOLVED HERE?

The draft doesn't really give any information about the specific uses that
are expected for the protocol features defined. This makes it hard to
evaluate. For instance, I can see that what you have defined would be useful
in tracking "clickstreams." However, the problems mentioned above and others
not mentioned make it hard to see how this proposal will help with "shopping
carts" and a variety of other applications identified in the recent www-talk
discussions on this subject.

		bob wyman
Received on Tuesday, 8 August 1995 03:06:52 UTC