- From: Jim Gettys <jg@pa.dec.com>
- Date: Fri, 12 Sep 1997 04:40:39 -0700
- To: fielding@ics.uci.edu, luotonen@netscape.com, henrysa@exchange.microsoft.com
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, mogul@pa.dec.com, freier@netscape.com, paulle@microsoft.com
The attached is exerpted from an Internet draft Jeff Mogul is submitting to the ID editor today, with some very interesting trace data. As you know, bad dates in documents will badly affect caching behavior in the Web, up to and including serving documents long after they should have expired to unsuspecting users (with no way in HTTP/1.0 to ever force a reload on the cache, this problem will be with us for a long time until most 1.0 proxies are gone)... The situation is much worse than I believe most of us or all of us have realized. More than 1/5 of the servers are wrong by more than a minute. Ugh... Shudder... Median errors are in the two minute range. While I will be adding some text to the 1.1 spec encouraging clock synchronization for reliable caching operation, there are some concrete things that can/should be done by those who have influence over HTTP implementations and documentation. 1) installation directions and scripts for Web servers/prxies should strongly encourage the use of clock synchronization (e.g. use of NTP or equivalent). In server installation directions I've seen, there has never been any mention of this topic (not that I've installed a server recently). 2) server implementors might consider some "sanity checks" in their code to warn operators that their systems are likely running badly synchronized. I can think of some heuristics that might work. I can think of ugly hacks like looking for the existance of an NNTP server running. It may be that the system call interfaces to adjusting clocks might or might not be useful to warn operators (it's been too long since I looked at how NTP is commonly implemented, and whether those system call interfaces provide applications useful information on whether the clock is running within the phase lock capture range).... Exactly what might/should be done here is not completely clear and maybe worth discussion. In any case, I think at a minimum installation directions for Web servers and proxies should get some work to encourage better practice, even if not a line of code changes in the software itself. This is a call for us to go poke our respective documentation folks on this topic... - Jim - Jim Gettys >From Jeff Mogul's draft on the Age computation problem, being submitted later today.... Is clock skew a real problem? Unfortunately, I know of no systematic study of HTTP client clock skews. This is difficult, in part, because HTTP requests generally do not include a Date header. However, since I do have access to a trace of the headers flowing through a proxy whose clock, at the time of the trace, was carefully synchronized using NTP, I was able to look at the clock-skew distribution of a large set of HTTP servers. (The trace covers 22034 distinct server IP addresses.) While this is not the same as a population of HTTP clients, one might actually expect a set of HTTP servers to have better clock synchronization characteristics than a set of HTTP clients. After all, many HTTP clients run on personal computers or workstations, and are managed by non-experts; most Web servers on the Internet have at least some semblance of administration (e.g., someone at least had to obtain a DNS name). In other words, whatever the situation with Web server clocks, one would expect the situation among clients to be worse. For each response in the trace, I compared the Date header field value (if any) to the proxy's NTP-synchronized timestamps for the start of the connection and the end of the connection. If the server's clock is accurate, the Date value ought to be between those two timestamps. If the server's clock is slow, the Date value would be lower than the start-timestamp; if the server's clock is fast, the Date value would be higher than the end-timestamp. Because of the 1-second granularity of Date, I treated as "valid" any values less than 1 second in error. I also treated as "obviously bogus" any Date where the server's clock appeared to be more than 1 day wrong, since one could assume that such a badly skewed server clock would be abnormal. The trace contained 503969 responses with parsable response headers. Of these, only 286779 actually had Date headers (most of the rest appear to be PointCast responses). 1087 of these had Date values that were clearly bogus (by the "1-day-wrong" test). Of the others, 116966 (41%) showed a server with a "slow" clock (by at least one second), and 83782 (29%) showed a "fast" clock. Only 84944 (30%) had apparently-synchronized clocks. What if we set the threshold for an OK clock at +/- 60 seconds (which, by the earlier analysis, is somewhat larger than the Error_C_bound for N = 6 and Max_RTT = 2)? In this case, we still find 79443 (27%) responses indicating "slow" clocks, and 56429 responses (20%) indicating "fast" clocks. In other words, a lot of the clocks are off by a lot of time. Using the 1-second threshold, the mean error in the slow clocks is 1287 seconds, with a median error of 113 seconds. For the fast clocks, the mean error is 1383 seconds, with a median of 97 seconds. Using the 60-second threshold, the mean error in the slow clocks is 1884 seconds, with a median error of 198 seconds. For the fast clocks, the mean error is 2039 seconds, with a median of 152 seconds. (We're removing the small-error samples from these sets, so we're left with sets biased towards high-error samples.) In summary, clock skew seems to be prevalent among HTTP servers, and the skews seem to be fairly large. One might be justified in guessing that the situation is worse among HTTP clients. NOTE: I should reanalyze this data, breaking it down by server address, rather than by response, but that will have to wait for another draft of this document.
Received on Friday, 12 September 1997 06:13:17 UTC