Re: 3 Proposals: session ID, business-card auth, customer auth from James Pitkow on 1995-07-18 (www-talk@w3.org from July to August 1995)

From: James Pitkow <pitkow@cc.gatech.edu>
Date: Tue, 18 Jul 1995 12:38:00 -0400 (EDT)
To: connolly@beach.w3.org (Daniel W. Connolly)
Cc: brian@organic.com, tmyerson@iserver.interse.com, www-talk@w3.org
Message-Id: <199507181638.MAA21500@hapeville.cc.gatech.edu>
Hello,
 
   You ever get the feeling of a cat chasing it's own tail?  Every
time this comes up (see last thread around April 14th, 1995) it's 
always the same positions.  A few new things:

Our research confirms that techniques do indeed exist that can 
determine sessions for existing access logs.  Without getting into 
proprietary techniques or results, we can confirm the statistical 
validity of a 20 minute or session boundary as measures by actual client side 
monitoring.  Our study used 25.5 minutes, which was 1.5 standard deviations 
(generous) away from the mean with low variance.  See this really long 
URL for the online version of some of our published findings:

<URL:http://www.gatech.edu/lcc/idt/Students/Catledge/browsing/
UserPatterns.Paper4.formatted.html>

More over, calculations of percent error for ambiguous cases can 
also be computed along with confidence intervals to quantify the
loss involving these determinations.  Thus, errors that extend from
anomalous browsing patterns from proxied domains can be contained.

An underlying assumption from Terry's research is that humans 
can accurately determine paths in a post-hoc manner.  Unless this is 
proven via a study that takes known sessions and tests the ability
of humans to determine sessions, we will not be able to rely upon this 
assumption. Note, we could do that with our datasets.

From Dan.

> Each HTTP request should include a header field of the form:
> 
> 	Request-ID: $session $request++
> e.g.
> that is, at the beginning of each session, the HTTP client chooses a
> random number, and each request in that session is identified by a
> number that increases monotincally with time. A "session" is not
> formally defined (other than "a set of requests with the same $session
> id"), though I suggest that browsers begin a session when they are
> invoked, and allow some user interface to say "start a new session"
> (i.e. "choose a new random session ID").

Again, this really gets into the notion of user profiling and profile
maintainence.  I'm extremely wary of systems that enable log files to
be collated and intelligent algorithms applied.  Note that companies
which currently do this, like I/PRO, could potentially become the
Equifax and TRWs of the future.  Dan, where are you with the Constitution
that was kicked around ages ago?  Should we protect against this? Can we?
How may of the sites that require email addresses for admission  outline
their policy for usage of this information.  In the US, companies can
do whatever they want with this information, even if they do not tell you
they are doing it.  Other countries have more consumer-protective laws.
This is just the tip of the iceberg on the potential privacy violations.

> One might argue (in fact, one has argued: Hi Henrik!) that this is an
> extension of the From: field, and these data belong there. I don't
> believe so: if the From: field is present, it should contain a valid
> email address of the requesting user (clearly the server cannot depend
> on the authenticity of the From: field, but that doesn't mean we
> should corrupt it further in the protocol spec).

I was never clear of why we would want to have an e-mail address sent
in the first place.  Dan, maybe you could review for everyone's sake why
this is and what scenarios demand its presence.

> mechanism might allow unwanted correlations to be observed. So perhaps
> there should be a preference to turn this feature off.

So then what is accomplished? If I knew I could control whether or not 
people could track me, what is my incentive to keep it on?

> ******* II. The business-card authentication scheme
> 
> I propose a new http authentication scheme; let's call it
> "business-card". Its purpose is to facilitate access control policies
> similar to "I'll show you my information if you'll leave your business
> card in the bowl."

The collation of demographic information as a requisite for information
access is not acceptable to me.  When I go into the library I can do so 
without being monitored or tracked. If I check out a book, though, then
I leave a trail.  In this new information theater, do we really have
to give up even more privacy?

> I haven't had time to discuss the privacy issues in detail, nor talk
> about the required but hidden IVth proposal, which is that proxies and
> caches relay certain log info to information providers.

Yup, some really smart person knows how to do this and is doing it.
This solves a lot of the privacy issues as it does not introduce any new
fields of information.

Jim.
Received on Tuesday, 18 July 1995 12:38:19 UTC