- From: Jim Gettys <jg@pa.dec.com>
- Date: Fri, 12 Sep 1997 04:40:39 -0700
- To: fielding@ics.uci.edu, luotonen@netscape.com, henrysa@exchange.microsoft.com
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, mogul@pa.dec.com, freier@netscape.com, paulle@microsoft.com
The attached is exerpted from an Internet draft Jeff Mogul is submitting
to the ID editor today, with some very interesting trace data.
As you know, bad dates in documents will badly affect caching behavior in
the Web, up to and including serving documents long after they should have
expired to unsuspecting users (with no way in HTTP/1.0 to ever force a reload
on the cache, this problem will be with us for a long time until most 1.0
proxies are gone)...
The situation is much worse than I believe most of us or all of us have
realized. More than 1/5 of the servers are wrong by more than a minute.
Ugh... Shudder... Median errors are in the two minute range.
While I will be adding some text to the 1.1 spec encouraging clock
synchronization for reliable caching operation, there are some concrete
things that can/should be done by those who have influence over HTTP
implementations and documentation.
1) installation directions and scripts for Web servers/prxies should strongly
encourage the use of clock synchronization (e.g. use of NTP or equivalent).
In server installation directions I've seen, there has never been any mention
of this topic (not that I've installed a server recently).
2) server implementors might consider some "sanity checks" in their code
to warn operators that their systems are likely running badly synchronized.
I can think of some heuristics that might work. I can think of ugly hacks
like looking for the existance of an NNTP server running. It may be
that the system call interfaces to adjusting clocks might or might not be
useful to warn operators (it's been too long since I looked at how NTP is
commonly implemented, and whether those system call interfaces provide
applications useful information on whether the clock is running within the
phase lock capture range).... Exactly what might/should be done
here is not completely clear and maybe worth discussion.
In any case, I think at a minimum installation directions for Web servers
and proxies should get some work to encourage better practice, even if not
a line of code changes in the software itself. This is a call for us to
go poke our respective documentation folks on this topic... - Jim
- Jim Gettys
>From Jeff Mogul's draft on the Age computation problem, being submitted
later today....
Is clock skew a real problem? Unfortunately, I know of no systematic
study of HTTP client clock skews. This is difficult, in part,
because HTTP requests generally do not include a Date header.
However, since I do have access to a trace of the headers flowing
through a proxy whose clock, at the time of the trace, was carefully
synchronized using NTP, I was able to look at the clock-skew
distribution of a large set of HTTP servers. (The trace covers 22034
distinct server IP addresses.) While this is not the same as a
population of HTTP clients, one might actually expect a set of HTTP
servers to have better clock synchronization characteristics than a
set of HTTP clients. After all, many HTTP clients run on personal
computers or workstations, and are managed by non-experts; most Web
servers on the Internet have at least some semblance of
administration (e.g., someone at least had to obtain a DNS name). In
other words, whatever the situation with Web server clocks, one would
expect the situation among clients to be worse.
For each response in the trace, I compared the Date header field
value (if any) to the proxy's NTP-synchronized timestamps for the
start of the connection and the end of the connection. If the
server's clock is accurate, the Date value ought to be between those
two timestamps. If the server's clock is slow, the Date value would
be lower than the start-timestamp; if the server's clock is fast, the
Date value would be higher than the end-timestamp.
Because of the 1-second granularity of Date, I treated as "valid" any
values less than 1 second in error. I also treated as "obviously
bogus" any Date where the server's clock appeared to be more than 1
day wrong, since one could assume that such a badly skewed server
clock would be abnormal.
The trace contained 503969 responses with parsable response headers.
Of these, only 286779 actually had Date headers (most of the rest
appear to be PointCast responses). 1087 of these had Date values
that were clearly bogus (by the "1-day-wrong" test). Of the others,
116966 (41%) showed a server with a "slow" clock (by at least one
second), and 83782 (29%) showed a "fast" clock. Only 84944 (30%) had
apparently-synchronized clocks.
What if we set the threshold for an OK clock at +/- 60 seconds
(which, by the earlier analysis, is somewhat larger than the
Error_C_bound for N = 6 and Max_RTT = 2)? In this case, we still
find 79443 (27%) responses indicating "slow" clocks, and 56429
responses (20%) indicating "fast" clocks. In other words, a lot of
the clocks are off by a lot of time.
Using the 1-second threshold, the mean error in the slow clocks is
1287 seconds, with a median error of 113 seconds. For the fast
clocks, the mean error is 1383 seconds, with a median of 97 seconds.
Using the 60-second threshold, the mean error in the slow clocks is
1884 seconds, with a median error of 198 seconds. For the fast
clocks, the mean error is 2039 seconds, with a median of 152 seconds.
(We're removing the small-error samples from these sets, so we're
left with sets biased towards high-error samples.)
In summary, clock skew seems to be prevalent among HTTP servers, and
the skews seem to be fairly large. One might be justified in
guessing that the situation is worse among HTTP clients.
NOTE: I should reanalyze this data, breaking it down by server
address, rather than by response, but that will have to wait
for another draft of this document.
Received on Friday, 12 September 1997 06:13:17 UTC