- From: Brian Behlendorf <brian@organic.com>
- Date: Mon, 17 Apr 1995 19:39:24 -0700 (PDT)
- To: www-talk@www10.w3.org
There are a couple systems starting to be deployed now that attempt to gain information about "clickstreams". "Clickstreams" are the paths people take when they traverse your site - many content providers would find it useful to be able to detect common patterns or the effectiveness of various user interfaces. The problem is, of course, that HTTP is stateless, and beyond the hostname offers very little in the way of identification of unique "trips" through the content site. Given that more than one person can use a hostname (proxy servers, etc), there's no reliable way to exactly identify a unique person without implementing access control (as I did at HotWired, and believe me, it's not a general solution). Compound this with the fact that people can begin and end their "trips" at any page on a site, and you'll see this is a big problem for sites interested in this kind of statistical information. The systems being implemented, by companies like IPRO (http://www.ipro.com/) and content providers like PathFinder (http://www.pathfinder.com/) are fatally flawed. They create a unique session ID when a user touches their home page that gets encoded between the hostname and the path/file in the URL (in the case of pathfinder), and that session ID stays with you throughout your journey through the site. Of course, this session ID also stays with me if I save a hotlist reference to a page beyond the home page, or if I cut and paste the URL and mail it around to my friends. In the latter case, if I'm given the session ID of "KJHFJHDSF", then all my friends go visit that page, they'll all see accesses under session-ID "KJHFJHDSF". This system also destroys caching of documents, both local disk and proxy caches. I told this much to a reporter at MediaWeek last week. There is definitely a demand for this kind of information, and it would help make professional web sites more responsive to what really works and what doesn't - and this is also information that current web logs and the HTTP protocol really can't provide. However, any proposed solution *must* protect the anonymity of the user, for it's not really necessary to lose that when all that's cared about is unique sessions. So, I'd like to propose for discussion a new HTTP header (hi Roy!) called "Session-ID". This would be optional, of course, and it would change any time the browser is restarted (or when the user wished). It would consist of a string of 32 random base 64 characters (or whatever encoding is allowed in headers). It would allow the content provider to see the "path" one takes through his system, even when two separate requests are interlacing through a proxy server (HotWired would often get 5 individuals hitting it from antares.prodigy.com at the same time), without requiring user authentication or divulging of any personal information. The "From:" header would also work, but it would give away information that most would probably prefer not to give. The only flaw is that the session-ID is temporary and can't be used to determine if 50 sessions are 5 people visiting you 10 times, or 50 people visiting you once. An analysis of domains can help with that though. Comments? I'd obviously like to try implementing this, maybe it's time to learn elisp..... :) Brian --=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-- brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Received on Monday, 17 April 1995 22:39:15 UTC