Re: Perl script from Arthur Secret on 1994-12-08 (www-vlib@w3.org from December 1994)

From: Arthur Secret <secret@www5.cern.ch>
Date: Thu, 8 Dec 1994 22:55:35 --100
To: dlopata@Stat.UFL.Edu
Cc: www-vlib@www0.cern.ch
Message-Id: <9412082155.AA28809@www5.cern.ch>
Thanks for your proposition, Dave!

On December  8, you (Dave Lopata) wrote
:>
:>..
:>
:>	This may be over simplied, but:
:>
:>#!/usr/local/bin/perl
:>#
:># quickcount.pl
:>#
:># Quick counter thingie.
:>#
:>chop ($date = `date +%d/%b/%Y`);
Great! I've included this command to both scripts!

:>$logfile = "/usr/local/spool/httpd/httpd-log";
:>$webarea = "/usr/local/lib/www";
:> 
:>$filename = $ARGV[0];
:> 
:>if (-f "$webarea/$filename")
:>{
:>        @countarray = `grep $date $logfile | grep $filename`;
I don't like this line: to put in an array all these lines may take a lot
of memory! For example, I get an average of 40 000 lines per day
for the top page, multiplied by 30 days, and 116 caracters, makes
140 Mb of memory! I'd rather avoid that!

:> 
:>        print "grep $date $logfile | grep $filename\n";
:> 
:>        $count = $#countarray + 1;
:> 
:>        print "Today ($date) access: $count\nLinks off from $filename\n";
:>}
:>
:>	Will give you the count, and then you can just pipe it off to
:>whatever file you want. . .
Right, but my perl script was naming the file automatically to
follow our conventions

:>
:>	$logfile and $webarea needs to be set, of course.  So, if
:>you're Web files are in /usr/local/lib/www, and you've got your vlib
:>stuff in a subdirectory "vlib", then something like this:
:>
:>	quickcount.pl vlib/statistics.html
:>
:>	Should work.
:>

Thanks for your suggestions!

I suggest:
1/ For those that maintain only one file, my perl script may be considered
too complex: a simple grep would do the job, as in Dave's perl script
or my C-shell script.

2/ For those that want to gather statistics on several files, I think
my perl script is more appropriate: it will read only once the statistics
file to get the number of connections for all necessary files, and it
won't require such an enormous swap space as Dave's script.

Here is the latest version of my perl script:

-------------------------------- Cut here ----------------------------------

#! /usr/local/bin/perl
#
# Author: Arthur Secret (secret@w3.org)
#
# This file is available at
# http://info.cern.ch/hypertext/DataSources/bySubject/vl_stats.pl
#
# It counts the number of times a specific document has been accessed
# during the day it is run. Therefore, this script should be included in a
# crontab file to run at 11:55 pm each day, eg with the line
#
# 23 55 * * * /home/secret/hypertext/DataSources/bySubject/vl_stats.pl
#
# It will write its result according to the following rule:
#
# >Architecture is in: /VIRTUALLIB/arch.html
# >so the stat file would be: /VIRTUALLIB/stat.arch.html
# >Lan. Arch is in: /VIRTUALLIB/larch.html
# >so the stat file would be: /VIRTUALLIB/stat.larch.html
#
# OK. And for me, with a URL that ends in /, then just "stat" would be my name.
#
#
# Customization:
#
# URLs of the documents you wish to get stats for
$url[0]="http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html";
$url[1]="http://info.cern.ch/hypertext/DataSources/bySubject/Overview2.html";
$url[2]="http://info.cern.ch/hypertext/DataSources/";
#
# Location of your stats file
$stats="/home/secret/test/httpd-log-txt.9412";
#
# If you're not running this script on the same directory as your html 
# documents, you have to give the path of your documents, as it may be
# different from the one provided in the URL. In the given example,
# the script is run on the same directory as the first two URLs, so
# I just specify the path for the third one.
#
$url_path{2}="/home/secret/hypertext/DataSources";
#
# This is it!
#

# defines the name of the file where the information should
# be written to
foreach $j (0 .. $#url) {
    $url[$j] =~ m|^.*//[^/]*(/.*)$| && ($end_url[$j] = $1);
    (($url_path{$j}) && ((($url[$j] =~ m|/([^/]+)$|) 
      && ($file[$j] = "$url_path{$j}/stat.$1")) 
     || ($file[$j] = "$url_path{$j}/stat")))
	|| ((($url[$j] =~ m|/([^/]*)$|) && ($file[$j] = "stat.$1")) 
	|| ($file[$j] = "stat"));
}


chop ($date = `date +%d/%b/%Y`);

open (FULL_STATS,"$stats") || die "Can't open file $stats: $!";
while (<FULL_STATS>)
{
    $line = $_;
    foreach $j (0 .. $#url)
    {
	($line =~ m|GET ([^ ]*) |) && ($1 eq $end_url[$j]) 
	    && ($line =~ m|\[(\d*/\w*/\d*):|) 
		&& ($1 eq $date) && ($i{$j}++);
    }
}
close FULL_STATS;

foreach $j (0 .. $#url) {
    open (STATS,"> $file[$j]") || die "Can't open file $file[$j]: $!";
    print STATS "Version: 1.0\n\nToday ($date) access: $i{$j}\n";
# Here you may wish to add other information
#     print STATS "Version: 1.0\n\nToday ($date) access: $i{$j}\nLinks 
#     off from the page: \nPublications: \n";
    close STATS;
}
----------------------------- Cut here -----------------------------------

Arthur
Received on Thursday, 8 December 1994 21:55:40 UTC