Re: "Hits" pragma

Paul Burchard	<burchard@cs.princeton.edu> writes:
> 
> Talking to several people privately, I've been convinced that  
> "bundled request" reporting information would most naturally be  
> placed into a Forwarded header, instead of a Pragma.  Forwarded  
> headers have similar semantics to Pragmas (they accumulate and must  
> be passed on down the line of proxies), and already represent a  
> primitive form of reporting.
> 
> So, keeping in mind Brian Behlendorf's requirements list, here is a  
> more refined proposal containing two extensions of the Forwarded  
> header.  I think both should be considered, since they provide  
> different balances between the demands of proxies and servers.
> 
> The first (proxy-biased) extension, which only requires constant  
> storage per cached item at the proxy, adds a "count" clause to the  
> Forwarded header:
> 
> 	Forwarded: by http://proxy/ count 34
> 
> This is equivalent to the hits pragma, except that hit counts are   
> assigned to specific "leaf" proxies.  To be precise:  The count is  
> reset to zero after the proxy forwards a request.  Thereafter, every  
> request for the resource received _without_ any Forwarded headers  
> increases the count by one; every request _with_ one or more  
> Forwarded headers simply contributes its Forwarded headers to the  
> eventual list of headers to be submitted by the proxy.
GREAT! I think, this will satisty people 'just wondering' about hit counts.
> The second extension, which supplies a more acceptable level of  
> information, but at the cost of proxy storage proportional to the  
> number of hits received, adds an "mfor" (multiple-for) clause to the  
> Forwarded header:
> 
> 	Forwarded: by http://proxy/ mfor Pr5CH77RbN7g0HTux90R7GHK
> 
> Here, the argument of the "mfor" clause is a compressed logfile,  
> representing those requests received by the proxy for this resource  
> since last forwarding, which did not contain any Forwarded headers  
> of their own (as before, requests forwarded from proxies upstream  
> simply contribute their own Forwarded headers).
I think the detailed reporting mechanism can better be done external
to the http protocol. In the http we should specify only an option for
information providers (e.g. Web page owners) to specify such requirements
in a standardised way. This can be done by introducing a
Hit-reports-to: URL [format-spec]
header or something like that.
Additionally, according to the discussion in this list, we can state, that using
this header is legal only in a http response, which contain a valid
Expires: header.
I suggest, that at least http (which should be acted trough POST or PUT?)
and mailto URL shall be applicable here.
The second important thing is to specify the optional format-spec, 
(the default may be the 'common log file format' described in
http://www.w3.org/hypertext/WWW/Daemon/User/Config/Logging.html#common-logfile-format ),
because we can't specify now the possible other formats in detail.
Maybe somebody requires filling a form, for that case we shall enable 
http url-s as format specifiers. This adds of course some complexity
implementing compliant caches, but this is a relative simple way for customisation, and filling the from requires human intervention, which improves security 
(but depends on exact knowledge of security policies, which in turn is a MUST
requirement to any computer user.)
> What needs to be decided more precisely is (a) the compression  
> algorithm to be applied to the logfile, and (b) the format and  
> fields of the logfile.  Brian Behlendorf suggested a good minimum  
> set of log fields:  host, timestamp, referer (although some sort of  
> "host hiding" should probably be supported for privacy/security  
> reasons).  Any suggestions for the compression?  It would be nice to  
> have something that could be used incrementally by the proxy to  
> save space.
Compression is a good idea, but its applicability depends on the protocols,
carrying the information.
I suggest, that including log files (using even very sophisticated compression
techniques) in http request headers is dangerous:
resulting many 1000 char continuation lines may break too many implementations.

Privacy/security issues are important here. We shall state, that converting
login@host info into official e-mail address (like F.Last@domain) is legal
and recommended.

Now I make a try to implement this (e.g hit count, Hit-reports-to: and login/site hiding) in ichtus cache, and will report the progress.

Generally, recording hit counts will only improve the caches. Having hit counts,
the garbage collector part of the cache can take it into account, when chooses
files to be deleted.

Andrew. (Endre Balint Nagy) <bne@bne.ind.eunet.hu>

Received on Monday, 14 August 1995 04:59:52 UTC