RE: New document on "Simple hit-metering for HTTP" from Paul Leach on 1996-08-13 (ietf-http-wg@w3.org from July to September 1996)

From: Paul Leach <paulle@microsoft.com>
Date: Mon, 12 Aug 1996 21:24:11 -0700
To: "'koen@win.tue.nl'" <koen@win.tue.nl>
Cc: "'mogul@pa.dec.com'" <mogul@pa.dec.com>, "'http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com'" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>
Message-Id: <c=US%a=_%p=msft%l=RED-77-MSG-960813042411Z-24856@mail.microsoft.com>

>----------
>From: 	koen@win.tue.nl[SMTP:koen@win.tue.nl]
>Sent: 	Monday, August 12, 1996 3:57 PM
>To: 	Paul Leach
>Cc: 	mogul@pa.dec.com; koen@win.tue.nl;
>http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
>Subject: 	Re: New document on "Simple hit-metering for HTTP"
>
>Paul Leach:
>>>From:  koen@win.tue.nl[SMTP:koen@win.tue.nl]
>>>
>>>I can't see much difference, as far as efficiency is concerned,
>>>between your proposed use of the Vary header and max-age=0 cache
>>>busting.  Both as are inefficient, but at least max-age=0 is
>>>inefficient in a non-complex way.
>>
>>I think you need to read about how Vary works.
>
>I know how vary works.  I expect that service providers would send
>
>  Vary: Accept, Accept-Language, User-Agent, Referer, From
>
>or even
>
>  Vary: *
>
>under your scheme.  The latter is equivalent to max-age=0.

I think it's obvious that if the origin-server wants to know everything
about every request, then it has to make sure it sees every request, and
that the most efficient and obvious way to do that is with max-age=0. I
think that Vary: * would only obfuscate things.

(Of course, the other thing the content provider wants is to scale to
lots of clients, which means that they have to allow caching. So maybe
they'll accept some tradeoffs.)

Our whole proposal was addressed at providers who were willing to accept
less than total information in exchange for more efficiency.

> Even the
>former this gives you so much variance that cachability is effectively
>eliminated.

And that's why it won't be sent. This whole line of argument is
attacking a straw man. If the origin-server wants this much info,
they'll just set max-age=0.

For the cases that our proposal tries to address, I think a provider
that will accept less than total info would be satisfied with more like
one of these:
1. no Vary at all -- just raw stats on how many users see the page
2. Vary:  Accept-Language, User-Agent -- find out which language users
prefer and which browser they are using
3. Vary: User-Agent -- find out which broswer they are using.

Each variation would add an entry of maybe 100-200 bytes to store the
selecting headers and pointer to the entry containing the entity-body,
which will average about 10k bytes -- so about a 1-2% overhead per
variation.

They can get referrer info without the Referer header using the
technique in section 7 (due to a suggestion from you, which I forgot to
acknowledge -- for which I apologize, and will rectify in the next
release).

>  Also, the it is at least questionable whether cache
>implementers will actually implement all optimisations allowed by
>Vary.

I don't understand this. There seems to me to be two logical
implementations of Vary: treat it as if it were max-age=0, or implement
it fully. If they do it fully, then adding a counter to each selecting
header entry seems trivial. If you can explain another logical
intermediate implementation, then maybe I could buy this argument, but
until then...
>
>
>>>
>>>To do a good job at giving demographers the complete request data
>>>(including client IP address) they seem to want, a completely
>>>different mechanism is needed.
>>
>>Aren't you arguing out of both sides of your mouth here.
>
>No, I changed my mind after reading the messages from Erik Aronesty.
>I no longer think that providers who *only* want hit counts are in the
>majority.

OK. That wasn't clear. BTW: in a private exchange, Erik agreed that the
proposal will still give good enough IP address info. (The caches are
typically geographically close enough to the client, and all you *ever*
get, even when cache-busting, is the IP address of the caches.)
>
>> In the last
>>message, you claimed that providers are too unsophisticated to know the
>>difference between counts of conditional and unconditional GETs.
>
>For the record, I claimed that the *customers* of providers would not
>know the difference.

You mean the advertisers?
>
>I have outlined a number of problems in your proposal.  The bottom
>line is that I don't think it will work in its current form.

That doesn't follow from your objections. I summarize your objections
as:
1. Too complicated
2. Advertisers want more or different data than it collects, so it won't
get used
3. It will lower "hit counts" so providers will have no incentive to
deploy it
4. If used in certain ways, if will be inefficient

I think I've addressed all of these. Certainly "doesn't work" wasn't one
of them -- a correct implementation will do what we claimed it would.

>Paul

Received on Monday, 12 August 1996 21:28:54 UTC