- From: Paul Leach <paulle@microsoft.com>
- Date: Mon, 12 Aug 1996 21:24:11 -0700
- To: "'koen@win.tue.nl'" <koen@win.tue.nl>
- Cc: "'mogul@pa.dec.com'" <mogul@pa.dec.com>, "'http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com'" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>
>---------- >From: koen@win.tue.nl[SMTP:koen@win.tue.nl] >Sent: Monday, August 12, 1996 3:57 PM >To: Paul Leach >Cc: mogul@pa.dec.com; koen@win.tue.nl; >http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com >Subject: Re: New document on "Simple hit-metering for HTTP" > >Paul Leach: >>>From: koen@win.tue.nl[SMTP:koen@win.tue.nl] >>> >>>I can't see much difference, as far as efficiency is concerned, >>>between your proposed use of the Vary header and max-age=0 cache >>>busting. Both as are inefficient, but at least max-age=0 is >>>inefficient in a non-complex way. >> >>I think you need to read about how Vary works. > >I know how vary works. I expect that service providers would send > > Vary: Accept, Accept-Language, User-Agent, Referer, From > >or even > > Vary: * > >under your scheme. The latter is equivalent to max-age=0. I think it's obvious that if the origin-server wants to know everything about every request, then it has to make sure it sees every request, and that the most efficient and obvious way to do that is with max-age=0. I think that Vary: * would only obfuscate things. (Of course, the other thing the content provider wants is to scale to lots of clients, which means that they have to allow caching. So maybe they'll accept some tradeoffs.) Our whole proposal was addressed at providers who were willing to accept less than total information in exchange for more efficiency. > Even the >former this gives you so much variance that cachability is effectively >eliminated. And that's why it won't be sent. This whole line of argument is attacking a straw man. If the origin-server wants this much info, they'll just set max-age=0. For the cases that our proposal tries to address, I think a provider that will accept less than total info would be satisfied with more like one of these: 1. no Vary at all -- just raw stats on how many users see the page 2. Vary: Accept-Language, User-Agent -- find out which language users prefer and which browser they are using 3. Vary: User-Agent -- find out which broswer they are using. Each variation would add an entry of maybe 100-200 bytes to store the selecting headers and pointer to the entry containing the entity-body, which will average about 10k bytes -- so about a 1-2% overhead per variation. They can get referrer info without the Referer header using the technique in section 7 (due to a suggestion from you, which I forgot to acknowledge -- for which I apologize, and will rectify in the next release). > Also, the it is at least questionable whether cache >implementers will actually implement all optimisations allowed by >Vary. I don't understand this. There seems to me to be two logical implementations of Vary: treat it as if it were max-age=0, or implement it fully. If they do it fully, then adding a counter to each selecting header entry seems trivial. If you can explain another logical intermediate implementation, then maybe I could buy this argument, but until then... > > >>> >>>To do a good job at giving demographers the complete request data >>>(including client IP address) they seem to want, a completely >>>different mechanism is needed. >> >>Aren't you arguing out of both sides of your mouth here. > >No, I changed my mind after reading the messages from Erik Aronesty. >I no longer think that providers who *only* want hit counts are in the >majority. OK. That wasn't clear. BTW: in a private exchange, Erik agreed that the proposal will still give good enough IP address info. (The caches are typically geographically close enough to the client, and all you *ever* get, even when cache-busting, is the IP address of the caches.) > >> In the last >>message, you claimed that providers are too unsophisticated to know the >>difference between counts of conditional and unconditional GETs. > >For the record, I claimed that the *customers* of providers would not >know the difference. You mean the advertisers? > >I have outlined a number of problems in your proposal. The bottom >line is that I don't think it will work in its current form. That doesn't follow from your objections. I summarize your objections as: 1. Too complicated 2. Advertisers want more or different data than it collects, so it won't get used 3. It will lower "hit counts" so providers will have no incentive to deploy it 4. If used in certain ways, if will be inefficient I think I've addressed all of these. Certainly "doesn't work" wasn't one of them -- a correct implementation will do what we claimed it would. >Paul
Received on Monday, 12 August 1996 21:28:54 UTC