- From: James Pitkow <pitkow@cc.gatech.edu>
- Date: Tue, 18 Jul 1995 12:38:00 -0400 (EDT)
- To: connolly@beach.w3.org (Daniel W. Connolly)
- Cc: brian@organic.com, tmyerson@iserver.interse.com, www-talk@w3.org
Hello, You ever get the feeling of a cat chasing it's own tail? Every time this comes up (see last thread around April 14th, 1995) it's always the same positions. A few new things: Our research confirms that techniques do indeed exist that can determine sessions for existing access logs. Without getting into proprietary techniques or results, we can confirm the statistical validity of a 20 minute or session boundary as measures by actual client side monitoring. Our study used 25.5 minutes, which was 1.5 standard deviations (generous) away from the mean with low variance. See this really long URL for the online version of some of our published findings: <URL:http://www.gatech.edu/lcc/idt/Students/Catledge/browsing/ UserPatterns.Paper4.formatted.html> More over, calculations of percent error for ambiguous cases can also be computed along with confidence intervals to quantify the loss involving these determinations. Thus, errors that extend from anomalous browsing patterns from proxied domains can be contained. An underlying assumption from Terry's research is that humans can accurately determine paths in a post-hoc manner. Unless this is proven via a study that takes known sessions and tests the ability of humans to determine sessions, we will not be able to rely upon this assumption. Note, we could do that with our datasets. From Dan. > Each HTTP request should include a header field of the form: > > Request-ID: $session $request++ > e.g. > that is, at the beginning of each session, the HTTP client chooses a > random number, and each request in that session is identified by a > number that increases monotincally with time. A "session" is not > formally defined (other than "a set of requests with the same $session > id"), though I suggest that browsers begin a session when they are > invoked, and allow some user interface to say "start a new session" > (i.e. "choose a new random session ID"). Again, this really gets into the notion of user profiling and profile maintainence. I'm extremely wary of systems that enable log files to be collated and intelligent algorithms applied. Note that companies which currently do this, like I/PRO, could potentially become the Equifax and TRWs of the future. Dan, where are you with the Constitution that was kicked around ages ago? Should we protect against this? Can we? How may of the sites that require email addresses for admission outline their policy for usage of this information. In the US, companies can do whatever they want with this information, even if they do not tell you they are doing it. Other countries have more consumer-protective laws. This is just the tip of the iceberg on the potential privacy violations. > One might argue (in fact, one has argued: Hi Henrik!) that this is an > extension of the From: field, and these data belong there. I don't > believe so: if the From: field is present, it should contain a valid > email address of the requesting user (clearly the server cannot depend > on the authenticity of the From: field, but that doesn't mean we > should corrupt it further in the protocol spec). I was never clear of why we would want to have an e-mail address sent in the first place. Dan, maybe you could review for everyone's sake why this is and what scenarios demand its presence. > mechanism might allow unwanted correlations to be observed. So perhaps > there should be a preference to turn this feature off. So then what is accomplished? If I knew I could control whether or not people could track me, what is my incentive to keep it on? > ******* II. The business-card authentication scheme > > I propose a new http authentication scheme; let's call it > "business-card". Its purpose is to facilitate access control policies > similar to "I'll show you my information if you'll leave your business > card in the bowl." The collation of demographic information as a requisite for information access is not acceptable to me. When I go into the library I can do so without being monitored or tracked. If I check out a book, though, then I leave a trail. In this new information theater, do we really have to give up even more privacy? > I haven't had time to discuss the privacy issues in detail, nor talk > about the required but hidden IVth proposal, which is that proxies and > caches relay certain log info to information providers. Yup, some really smart person knows how to do this and is doing it. This solves a lot of the privacy issues as it does not introduce any new fields of information. Jim.
Received on Tuesday, 18 July 1995 12:38:19 UTC