Re: Time from Josh M. Osborne on 1996-04-14 (www-logging@w3.org from April 1996)

From: Josh M. Osborne <stripes@va.pubnix.com>
Date: Sun, 14 Apr 1996 12:11:37 -0400
To: jml4@cus.cam.ac.uk (John Line)
Cc: www-logging@w3.org
Message-Id: <MAA12307.199604141611@garotte.va.pubnix.com>
In message <m0u8OWT-00011oC@ursa.cus.cam.ac.uk>, John Line writes:
[...John argues that the text abbreviations conflict with each other,
presumabably he would prefer numeric offsets, but he doesn't actually
say what he would prefer...]

[I said the >> part, John the > part]
>>Of corse if we just require the use of GMT we solve the whole problem.
>>At a cost of some dirrect readability.  Which is what I would recomend.
>>
>>(then again if it were up to me we would be logging seconds since some
>>date -- the nominal start of WWW, or the Unix epoch, or some such.
>
>Either of those would provide the information, but having a human-readable
>form can be very useful - for example, it's quite common for me to need to
>search the logs based on date/time, using e.g. the UNIX grep or more
>commands. A regular expression search pattern to match lines with a
>timestamp between 10:00 and 10:05 is easy to type straight in;

I also need this from time to time.  When I do I have two choices:

weblog -nt logname | datefind `gdate -d 10:00 +%s` `gdate -d 10:05 +%s`

or:

weblog -t logname | egrep 10:0[0-5]

Both weblog and datefind are locally written.

The weblog program sends the log for a given customer to stdout.
The -n makes it output "new" format (which has time in seconds
since the beginning of the epoch), without -n it converts to the
common log format.  The -t does just todays log (-y is yesterday,
default is for the month, -p is last month -pp is two months ago...)

I tend to use the datefind one.  It is a bit faster (however that's
unfair, it is only faster because it processes the native format
which makes weblog the equivalent of cat, or gzcat (any log older
then a day is gzip'ed).  More importantly the grep can produce
false hits (not that many now that we have the -t flag, but people
still may have :03 in a URL), while the datefind one is immune from
false hits.   Even more importantly the logs I tend to "dategrep"
are *not* todays, but from 3 or 4 days ago.  It is a bigger pain to
come up with the regexps for grep then to use datefind.

FYI the full source for datefind is:

#!/usr/local/bin/perl5
while(<STDIN>) {
	$time = (split(m/ /, $_))[3];
	print $_ if ($time >= $ARGV[0] && $time <= $ARGV[1]);
}

>                                                                the
>corresponding seconds-since-whenever is. Logs may be used for a lot more
>than just retrospective analysis/reporting, and we shouldn't make a choice
>that renders them impractical for those other uses.

And we haven't.

However it is my contention that logs are more frequently used for
retrospective analysis/reporting, and we should make them as easy
to use for that task as we can without:
 * crippling them for other tasks (like debugging)

 * making the process of logging slow enough that people running web
 browsers notice (i.e. don't require things like a DNS name lookup,
 or a series of pings so the average round trip delay can be recorded)

 * making each entry huge (which would tend to reduce the amount of log
 people keep around)

I have a program that parses log files each month to produce reports.
Quite a while ago we switched from the common log format to a new
log format.  The program got a lot faster.  Most of the speed up
was from the format of the date changing from person-friendly to
machine-friendly.

I suspect a log of web reporting programs spend a fair amount of
time fiddling with person-friendly dates.

I think we should have a program-friendly date format defined.  It
may not be a good idea to have it be the only date format, but it
is definitely a good idea to have it available.

>On a related point, though it's easy to assume that (on a UNIX system at
>least) you can easily convert seconds-since-epoch to local time (potentially
>for any timezone you like), that's really not very reliable.
>
>DST start/end dates in many parts of the world are set by law from year to
>year, not by algorithm. I doubt a year has gone by without us seeing at
>least one operating system or another make the change on the wrong date (or
>need an update from the vendor to pre-empt the problem if noticed in
>advance). I would be wary of relying on retrospective local-time conversions
>using the zone definition information shipped with typical UNIX systems,
>since it's so often been proven incorrect, and historical definitions that
>didn't get fixed at the time (or only affected "irrelevant" timzones, so no
>perceived need to install the fix) may well never be fixed on a particular
>system.

Hm, I don't see how this argues for or agenst GMT.

If you have a failure to switch to/from DST at the right time with
'localtime' logs you will end up with bad data in your logs.  Bad
in a way very hard to fix (you need to edit the timezone rules and
fix them, then you need to edit all bad log entries).

If you have the failure while logging GMT times you will not be able
to map them back to localtime until to fix the timezone rules, but
once you fix the rules you won't need to alter the data.

Or, in either case if you "simply" force the system into the new timezone
you have to edit the data in the logs no matter which format it is in.

>In consequence, while GMT is the "obvious" reference point for the
>timestamps, the local timezone offset is certainly needed in order that
>usage reports etc., can relate reliably to the server's local time, and the
>current offset definitely needs to be embedded in the logs (initially and
>when it changes) in one or other of the ways people have proposed.

I don't think the point has been made.

Besides what is the right local timezone for an Intel web server?
The local time for that particular server (they have several in
the USA, and I suspect they will have them in other countries
sooner or later)? The local time for Intel's marketing department?

What would be the right time zone for a web server being run on
the USA for a company in Japan?  And how would it be specified if
the (same) machine in the USA also ran web servers for componyes
in other countries (or at least timezones)?

What would be the right time zone for a web server that has pages
written by people in different time zones?  (i.e. something like
webcomm where people get pages on the same server for a low monthly
fee)?
Received on Sunday, 14 April 1996 12:11:41 UTC