- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Thu, 03 Jul 97 18:03:16 MDT
- To: http-wg@cuckoo.hpl.hp.com
For a paper that I'm working on (with several other people), I needed to extract the "Last-Modified" values from a proxy trace that I made last December. The trace contains 504736 records, representing the activity of 7411 distinct client hosts, accessing 22034 distinct servers, referencing 238663 distinct resources (URLs). I.e., it's a significant slice of the Web. Anyway, we were surprised to find that a significant fraction of the Last-Modified values appeared to be in the future; i.e., the Last-Modified time was actually newer than the time at which the request was completed. (We timestamped our log entries on a system synchronized with NTP to a nearby GPS clock.) One part of the problem turned out to be a bug in the date-parsing code that we borrowed from the CERN httpd program. If you are using a routine called parse_http_time() from this code, you might want to check that it gives the right values in all cases. In particular, if daylight savings time is in effect when you parse the date, but not at the time specified by the date (or vice versa), the result may be wrong by an hour. Anyway, after fixing that bug, we still found that somewhat over 1% of the traced responses had "future" Last-Modified dates, or future "Date" dates. These tended to fall into two apparent categories: (1) servers that probably had their clocks set wrong (2) servers that sent non-GMT Last-Modified values. For example, a large fraction of the "future" values were just a little bit in the future. A suspiciously large spike in the distribution of errors appears at around 60 seconds; it looks like some people set their clocks to the right second, but the wrong minute. Other, smaller spikes appear near multiples of 3600 seconds (one hour); these may be from people sending time in non-GMT timezones, or it may be people who have set their clocks to the right minute, but the wrong hour. For some reason, there is a spike near 3.5 hours; maybe this is from sites in one of the places where the timezone offset is not an integral number of hours. Finally, there are a few sites who seem to be off by exactly one day. Of course, some of the Last-Modified dates might be set into the future for some bizarre caching-related reason, but this seems rather unlikely to be of actual benefit. -Jeff
Received on Thursday, 3 July 1997 18:08:14 UTC