Session tracking from Brian Behlendorf on 1995-04-18 (www-talk@w3.org from March to April 1995)

From: Brian Behlendorf <brian@organic.com>
Date: Mon, 17 Apr 1995 19:39:24 -0700 (PDT)
To: www-talk@www10.w3.org
Message-Id: <Pine.3.89.9504171914.f492-0100000@eat.organic.com>
There are a couple systems starting to be deployed now that attempt to gain
information about "clickstreams".  "Clickstreams" are the paths people take
when they traverse your site - many content providers would find it useful to
be able to detect common patterns or the effectiveness of various user
interfaces.  The problem is, of course, that HTTP is stateless, and beyond
the hostname offers very little in the way of identification of unique
"trips" through the content site.  Given that more than one person can use a
hostname (proxy servers, etc), there's no reliable way to exactly identify a
unique person without implementing access control (as I did at HotWired, and
believe me, it's not a general solution). Compound this with the fact that
people can begin and end their "trips" at any page on a site, and you'll see
this is a big problem for sites interested in this kind of statistical
information.

The systems being implemented, by companies like IPRO (http://www.ipro.com/)
and content providers like PathFinder (http://www.pathfinder.com/) are
fatally flawed.  They create a unique session ID when a user touches their
home page that gets encoded between the hostname and the path/file in the URL
(in the case of pathfinder), and that session ID stays with you throughout
your journey through the site.  Of course, this session ID also stays with me
if I save a hotlist reference to a page beyond the home page, or if I cut and
paste the URL and mail it around to my friends.  In the latter case, if I'm
given the session ID of "KJHFJHDSF", then all my friends go visit that page,
they'll all see accesses under session-ID "KJHFJHDSF".  This system also
destroys caching of documents, both local disk and proxy caches.  I told 
this much to a reporter at MediaWeek last week.

There is definitely a demand for this kind of information, and it would 
help make professional web sites more responsive to what really works and 
what doesn't - and this is also information that current web logs 
and the HTTP protocol really can't provide.  However, any proposed 
solution *must* protect the anonymity of the user, for it's not really 
necessary to lose that when all that's cared about is unique sessions.

So, I'd like to propose for discussion a new HTTP header (hi Roy!) called 
"Session-ID".  This would be optional, of course, and it would change any 
time the browser is restarted (or when the user wished).  It would 
consist of a string of 32 random base 64 characters (or whatever encoding 
is allowed in headers).  It would allow the content provider to see the 
"path" one takes through his system, even when two separate requests are 
interlacing through a proxy server (HotWired would often get 5 
individuals hitting it from antares.prodigy.com at the same time), 
without requiring user authentication or divulging of any personal 
information.  The "From:" header would also work, but it would give away 
information that most would probably prefer not to give.  

The only flaw is that the session-ID is temporary and can't be used to 
determine if 50 sessions are 5 people visiting you 10 times, or 50 people 
visiting you once.  An analysis of domains can help with that though.

Comments?  I'd obviously like to try implementing this, maybe it's time 
to learn elisp..... :)

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  brian@hyperreal.com  http://www.[hyperreal,organic].com/
Received on Monday, 17 April 1995 22:39:15 UTC