Re: determining proxy reliability from Patrick McManus on 1997-03-20 (ietf-http-wg@w3.org from January to March 1997)

From: Patrick McManus <mcmanus@appliedtheory.com>
Date: Thu, 20 Mar 1997 11:47:34 -0500 (EST)
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <199703201647.LAA08793@pat.appliedtheory.com>

My comments are based on the belief that
draft-ietf-http-hit-metering-00.txt dated 1/21/97 is current.. let me
know if that's not the case.. Jeff makes some references to 'latest
draft' that had me confused, but now I think I realize that he just
means an unreleased version..


In a previous episode Jeffrey Mogul said...
:: 
:: Item #2 is addressed in the latest draft, by adding an optional
:: timeout to the Meter response-directive (i.e., to the server's
:: request that the response be hit-metered).  This can't eliminate
:: the problem of proxy crashes or reboot, but it can bound the
:: likelihood of report-lost-due-to-proxy-failure.  E.g., if the
:: timeout is set to 10 minutes, and the mean time between reboots
:: for the "average" proxy is (say) 60 minutes, then there is a 1/6
:: chance of report loss.  Since I suspect that MTBF for proxies is
:: probably on the order of days, not hours, the actual loss
:: probability is likely to be lower.

This does address my issue nicely.. My first reaction was that it
complicates even further construction of the 'non-naive algorithm for
scheduling the deletion of hit-count entries' that the proxy is
strongly encouraged to use by section 3.5 but the capacity to bundle
together and schedule reports for a single server is far more predictable with
respect to a max time value than with respect to a max usage value so
the penalty appears worthwhile.

:: 
:: This leaves #3, loss-in-transit.  My experience is that the most
:: common way for servers to lose HTTP requests is due to internal
:: congestion (i.e., the SYN_RCVD problem), so if hit-metering
:: improves caching, the reduction in congestion ought to help this.
:: But loss due to network partition is also a problem, and (according
:: to Vern Paxson's SIGCOMM '96 paper) it's getting worse.  This
:: has inspired me to change the text in the next version of the
:: draft from "The proxy is not required to retry the [report]
:: if it fails" to "The proxy is not required to retry the [report]
:: if it fails (but it should do so, subject to resource constraints)."
:: This is still "best-efforts", but the specification now encourages
:: more effort.
:: 

A strictly editorial comment.. I like the content, how about the slightly
firmer language: The proxy SHOULD retry the [report] if it fails but
MAY abort it if resource constraints dictate.

:: I think if the origin server can say "send a report within
:: X minutes, if you have anything to report" then this effectively
:: does the same thing as a request for 0/0 reports, but without
:: the additional message overhead.

sold me.

-P

Received on Thursday, 20 March 1997 08:47:08 UTC