Perl script from Arthur Secret on 1994-12-08 (www-vlib@w3.org from December 1994)

From: Arthur Secret <secret@www5.cern.ch>
Date: Thu, 8 Dec 1994 18:58:52 --100
To: www-vlib@www0.cern.ch
Message-Id: <9412081758.AA27497@www5.cern.ch>
On December  8, you (David Hopp) wrote
:>I would appreciate it if Perl scripts for these various tasks could be 
:>made available.  No reason for all of us to solve the same problem 
:>individually.
:>
:>Dave Hopp
:>
Good idea!

Here's my first try, only in perl for now:

Please try it out, and send back comments!

------------------------- Cut here -------------------------------------------
#! /usr/local/bin/perl
#
# Author: Arthur Secret (secret@w3.org)
#
# This file is available at
# http://info.cern.ch/hypertext/DataSources/bySubject/vl_stats.pl
#
# It counts the number of times a specific document has been accessed
# during the day it is run. Therefore, this script should be included in a
# crontab file to run at 11:55 pm each day, eg with the line
#
# 23 55 * * * /home/secret/hypertext/DataSources/bySubject/vl_stats.pl
#
# It will write its result according to the following rule:
#
# >Architecture is in: /VIRTUALLIB/arch.html
# >so the stat file would be: /VIRTUALLIB/stat.arch.html
# >Lan. Arch is in: /VIRTUALLIB/larch.html
# >so the stat file would be: /VIRTUALLIB/stat.larch.html
#
# OK. And for me, with a URL that ends in /, then just "stat" would be my name.
#
#
#
# Customization:
#
# URLs of the documents you wish to get stats for
$url[0]="http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html";
$url[1]="http://info.cern.ch/hypertext/DataSources/bySubject/Overview2.html";
$url[2]="http://info.cern.ch/hypertext/DataSources/";
#
# Location of your stats file
$stats="/home/secret/test/httpd-log-txt.9412";
#
# If you're not running this script on the same directory as your html 
# documents, you have to give the path of your documents, as it may be
# different from the one provided in the URL. In the given example,
# the script is run on the same directory as the first two URLs, so
# I just specify the path for the third one.
#
$url_path{2}="/home/secret/hypertext/DataSources";
#
# This is it!
#
foreach $j (0 .. $#url) {
    $url[$j] =~ m|^.*//[^/]*(/.*)$| && ($end_url[$j] = $1);
    (($url_path{$j}) && ((($url[$j] =~ m|/([^/]+)$|) 
      && ($file[$j] = "$url_path{$j}/stat.$1")) 
     || ($file[$j] = "$url_path{$j}/stat")))
	|| ((($url[$j] =~ m|/([^/]*)$|) && ($file[$j] = "stat.$1")) 
	|| ($file[$j] = "stat"));
}
$date = `date`; # we should get "Thu Dec  8 16:17:01 MET 1994"
$date =~ /^\w* (\w*) *(\d\d) / && ($date ="$2/$1");
$date =~ /^\w* (\w*) *(\d) / && ($date ="0$2/$1");

open (FULL_STATS,"$stats") || die "Can't open file $stats: $!";
while (<FULL_STATS>)
{
    $line = $_;
    foreach $j (0 .. $#url)
    {
	($line =~ m|GET ([^ ]*) |) && ($1 eq $end_url[$j]) 
	    && ($line =~ m|\[(\d*/\w*)/19|) 
		&& ($1 eq $date) && ($i{$j}++);
    }
}
close FULL_STATS;

foreach $j (0 .. $#url) {
    open (STATS,"> $file[$j]") || die "Can't open file $file[$j]: $!";
    print STATS "Version: 1.0\n\nToday ($date) access: $i{$j}\n";
# Here you may wish to add other information
#     print STATS "Version: 1.0\n\nToday ($date) access: $i{$j}\nLinks 
#     off from the page: \nPublications: \n";
    close STATS;
}

------------------------- Cut here -----------------------------------------
Received on Thursday, 8 December 1994 17:58:55 UTC