Re: Conventions for Sharing User Agent Profiles from Koen Holtman on 1996-08-13 (ietf-http-wg@w3.org from July to September 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Tue, 13 Aug 1996 23:55:51 +0200 (MET DST)
To: Simon Spero <ses@tipper.oit.unc.edu>
Cc: koen@win.tue.nl, mogul@pa.dec.com, jg@zorch.w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199608132155.XAA01292@wsooti04.win.tue.nl>
Simon Spero:
>
>
>With NG, there isn't quite a strong a division between the UA profile and 
>the individual user's profile; profiles can be modified dynamically; 
>the server has the option of either caching the whole modified profile, 
>or just the 'base class' profile; the client just needs to know and note  
>what profile has been cached. 

So if I understand you correctly, there are two profiles: the UA
profile and the user profile.  Or is there a complete inheritance
layer in there, which allows 5 layered profiles if you want to?  

How does the client know what profile has been cached?  If it knows
that the user profile is cached, will the client send the entire
profile in the request or will it omit any user profile info?

If I'm negotiating a bilingual homepage, how do I tell the client to
send me the user profile with the language preferences?

>> where do you stop exactly?  How many `profile cache misses' do you
>> estimate under your proposal?
>
>The question of where you stop is up to the server; just caching the UA 
>specific base profiles wins big; spending  the extra effort to cache 
>per-user profiles is an even bigger win - the tradeoff depends on how 
>much perisistent storage you wan tto dedicate to the problem. 
>
>I would expect to see a big win with a cache size of around 20 (enough for
>the most popular Nevergethere, Exploder, and Slosaic versions to be safe
>from getting flushed by the small fry. There'd be a bigger win around 
>2000, as even the small-fry get to stay put.

I think you are right about the UA profile (as long as UAs remain
monolithic systems, that is).  I have doubts about this working for
user profiles; read on.

>For a big site with a regular audience, 

But what if I'm a small site where random people drop by to read 2
pages?

>it might be worth spending a 
>hundred dollars or so on this and dedicating up to a gig or to profile 
>caching; this keeps things really fast for caching.

Taking 1K for a user profile, that would mean you have 1M users!

Suppose I am a small site where random people drop by to read 2
language-negotiated pages.  Now, will user profile caching outperform
sending large headers combined with a sticky header scheme in this
case?  That depends mainly on

   P_same: the probability that two users have the same user profile

if P_same is 1 in 1 million, and if you have already had 10.000
different users (OK, the site is not so small after all), the next
user will get a user profile cache hit in about 1% of all cases.  Not
very optimal.

So what is a realistic estimate of P_same?  That depends on the amount
of variance in the user profile.  Making a table of things you want in
your user profile:

   VERY CONSERVATIVE
   estimate of number of           description
   different settings

        100      accepted languages with some q factors
        10       accepted charsets with some q factors
        2^8      presence of common file viewers (word processors,
                 movie players, audio players)
        3^4      q factors for the 4 file viewers you have
        2^5      presence of common plug-ins
        10       type of screen (colordepth, monochrome, ...)
        4        size of screen
        2        I hate frames
        2        I hate animated gifs
        2        Life is too short for large images
     --------- * 
        2e11           

So a very conservative P_same is 1 in 2e11.  This reduces the chances
of getting a user profile cache hit for a different user to zero.
(Except for users who never change their default profiles, but how
many web users would be that boring?)

Also, with this P_same, the user profile cache key essentially becomes
a global user identifier.  Definitely not good for privacy.

I don't see how you can ever get the numbers working for user profile
caching.  The introduction of user profile caching scheme seems to put
penalties on sites seeking an irregular audience, even if such sites
are very big.

It seems to me that only transparent content negotiation scales for
P_same values of 1 in 1 million or more (while also still protecting
privacy).  And I think that for any moderately interesting collection
of things to negotiate on (like the table above), you will get a
P_same like this.

You can also look at this result in another way: if user profile
caching were effective, then P_same would be so low that we could
easily encode all user profile information in a short request header.

It seems that profile caching is a good way to `translate' a
user-agent string into a large table of capabilities (assuming that
user agents stay monolithic), but that it has little general use
beyond that.

>Simon (more later)

Koen.
Received on Tuesday, 13 August 1996 15:00:43 UTC