Metrics questions

From: Johan Hjelm (hjelm@w3.org)
Date: Fri, Feb 26 1999


Message-Id: <4.1.19990226155348.00bafdf0@127.0.0.1>
Date: Fri, 26 Feb 1999 17:30:17 +0100
To: www-wca@w3.org
From: Johan Hjelm <hjelm@w3.org>
Subject: Metrics questions

Hi all, 
I had some questions about the metrics that Jim put out, which we may
discuss during todays teleconference. I have also put a brief description
about how I see the recharacterisation architecture, and how the metrics go
into the metafiles, below. Let me know what you think.

Comments about metrics: 
Classification of users (educational, home, ISP, or corporate)
Roles within the classification: Sales, tech support, engineering;
teenager, parent, toddler, schoolchild; etc. (Note: Do we need to develop a
standard syntax/vocabulary for things like these?)
Domain of server (gTLDs and top domains are all very well, but which
language is it in? Some languages, e.g. Swedish, span several domains -
.se, .nu, .com are frequently used for servers. How do we handle
mixed-language servers? Servers that are in several domains (e.g.
ericsson.com, is also ericsson.se and ericsson.nl, and the content is the
same, to give just one example)
Cost and other access restrictions (e.g. IP-based access masks, robot.txt,
etc?)
Access method of users (LAN, modem, mobile, or wireless)
Access network (For wireless: GSM, HDSC, CDPD, CDMA, Mobitex, W-CDMA,
PHS/PIAFS etc). (How do you handle sites with mixed networks?)
Users, response rate, and attrition rate (this does not sound like a log
file analysis, rather like a survey?).
Pages transfer per user (How do you distinguish a page, if you are using
frames?)
Unique pages transfer per user (note: Should we not talk about objects
instead of pages here? Images, forms, scripts, template elements can be
objects; if we consider that, we make a future-proof characterisation
mechanism)
Sites visited per user (see the definition of "sites")
Mime-type percentage breakdown (e.g., html, jpg, ps, etc.)(What about other
file types?)
Protocol percent breakdown (e.g., http, shttp, gopher, etc.)(Note: This one
breaks down into a user centric/session centric part and a site centric
part. How do we distinguish sessions in other protocols than HTTP, e.g.
when a web page is used to start a Real Audio session? Is that out of our
scope?)
Hyperlinks per page (or per HTML file? See above about pages)
Sessions per user (should be "over measurement period"?)
Inter-session time per user (session to session time)
Sites visited per user (see the definition of "sites")
Intra-request time per user (request-to-render time - how do we measure this?)
Reoccurrence rates for files and pages per user (assumes longitudinal
tracking     capabilities - does it assume sessions?)
Number of search engine hits (note: How to measure)
Number of CGI/dynamic content serviced (?? What is the measure for this??)
Documents by Traffic graph ( x% documents account for y% of traffic)(see
again  comments about "documents")

I am assuming we have three different metafiles: A file describing the set
and the setup, a file describing the site, and a file describing the log
file. Below, I have tried to divide the metrics into these three: 

Meta-Set:
Location of the log files
Location of the metafiles
Location of the server (site) data (which may be different from the root
file system)
Periodicity of the analysis: Log files, server file system
Classification of users and user roles
Access methods of users (and method of generating sessions, e.g based on
access method)
Access network (same question)
Domain of server (Language question)
Cost and other access restrictions (e.g. IP-based access masks, robot.txt, etc)
Type of service provider
Birth and modification history of server (e.g. major revisions of content)

Meta-log:
Files transferred per user (total)
Unique files transferred per user
Pages transferred per user
Unique pages transferred per user
Sites visited per user (assumes longditudinal trackning)
Reoccurence rates for files and pages per user
Protocol percentage breakdown (e.g. HTTP, SHTTP, Gopher, etc).
Number of sessions per user
Length of session per user
Inter-session time per user (session-to-session time)
Stack distance per user
Inter-request time per user (request-to-request time)
Intra-request time per user (request-to-render time)
Length of visit per user

Meta-site:
Number of embedded images per page, file type, and size
Mime-type percentage breakdown of site (e.g. HTML, JPEG, PS, etc)
Hyperlinks per HTTP page
Site Composition (once per measurement session)
Number of users
Number of files and page requests per user
Number of search engine hits 
Number of files serviced 
Number of pages serviced
Number of CGI/dynamic content serviced 
Bytes transferred
Byte latency
Total number of files on server
Documents by Traffic graph ( x% documents account for y% of traffic)
Growth Rates (once per measurement session)
Number of users
Number of files and page requests per user
Number of files serviced
Number of pages serviced
Number of CGI/dynamic content serviced
Bytes transferred
Byte latency
Number of files on server
Number of bytes on server
Doubling period for all of the above metrics


Johan


************************************************************
                     Johan HJELM
            Ericsson RCUR T/K & Cyberlab NY 
         Currently visiting engineer at the W3C
             The World Wide Web Consortium
                     hjelm@w3.org
   http://www.w3.org/People/W3Cpeople.html#Hjelm
    Fax +1-617-258 5999, Phone +1-617-263-9630
   MIT/LCS, 545 Tech. Sq. Cambridge MA 02139 USA 
        opinions are personal, always my own, 
  and not necessarily those of Ericsson or the W3C. 
============================================================