Re: WWWVL: gathering stats (usage)

From: Rodney Hoinkes <rodney@odum.clr.toronto.edu>
Date: Wed, 7 Dec 94 12:04:03 -0500
Message-Id: <9412071704.AA08313@odum.clr.toronto.edu>
To: Multiple recipients of list <www-vlib@www0.cern.ch>, secret@www5.cern.ch
Arthur says:
> I don't mind wether we use a stat file, or comments inside the file.
> But I remember we had a previous discussion on this:

That discussion revolved around the description of the subject areas
if I recall - something at would logically be on each page that the
user accesses, but stats just eat more bandwidth for each regular user
without providing them something useful in this case (or not useful
to the vast majority).

I think a seperate file is a good idea and if we can define the variables
and language (and there aren't too many), I say why not! It would be
esp. useful if one or two people could hack up a script (perl & shell
since some of us do not have perl (egads - you do not say!)) to generate
this from the log file (since the format is standardized now in log
files).  Each site admin only needs to define the filenames that
constitute their home pages (I have about 5 that are links to the main
one so I get a sense of where people came from).

as to a 'stat' filename, I'm not as sure.  I have seperate pages for
Architecture & Landscape Architecture but BOTH reside in the same
home directory - should I: 1) combine them, or 2) have some way
of indicating the stat file for each??

I don't think I'm alone in this so I'm for some more careful definition
of the filename used - maybe stat.NORMALNAME, where NORMALNAME is the
name of the main 'home' page, in my case:

Architecture is in: /VIRTUALLIB/arch.html
so the stat file would be: /VIRTUALLIB/stat.arch.html
Lan. Arch is in: /VIRTUALLIB/larch.html
so the stat file would be: /VIRTUALLIB/stat.larch.html

Also, maybe it should not be called 'stat' but 'info' or something
more general to reflect other info we may want to put into it later
that is not pure stats.

So what 'info' do we want/can we gather easily/is useful?

Daily Avrg 'home' HTML file access (excluding inline images, etc.)
Daily Avrg 'links' off from the page (for those who record it - I just
Citations of the page in 'publications'

Other areas as we discussed before are too subjective in interpretation,
ie. # of links in main page, etc. (of doubtful use and open to vast
Although - # of links in 'library' might be useful and each admin can
decide what that means - add all sub-pages, database entries, etc.

I am thinking that this might be as useful to each subject as for the
whole library - how much does it catalog? (of course, cross-listings
are a problem!) - as a % of the 'known' pages out there? I'd be

What about 'how many' accesses from diff parts of the world? by class,
education, commercial, government?  It seems that this library is one
of the best sources for demographics of the web potentially!!

Wow - too much typing, I'll let others comment and see where this

Rodney Hoinkes
Head of Design Applications         WWW-URL: http://www.clr.toronto.edu:1080/
Centre for Landscape Research       AnonFTP: ftp.clr.toronto.edu
University of Toronto
