BACK

Web Characterization Metrics
April 15, 1999

Editor: Brian Lavoie href="http://www.oclc.org/">(OCLC)

Metric >Properties

Web Metrics

Web User >Metrics

Web Client >Metrics

Web Server >Metrics

Web Site >Metrics

Web Page >Metrics

Web Collection >Metrics
 
 

Metric >Properties

Every Web metric should be associated with a tuple that specifies >the data collection unit, and its scope in time and space:

Element: the Web object for which data is collected.

Examples:
User, Web client, Web server, Web site, Web >page, Web collection, the entire Web. See the href="http://www.oclc.org/oclc/research/projects/webstats/currterms.ht>m">terminology sheet for definitions of these Web objects.

Population Scope: the element population to which the metric applies.

Examples:
Entire Web, Web site, Web page ...

Temporal Scope: the time frame implicit in the metric.

Examples:
static measures, dynamic measures >(including rate of change, doubling period).

Metric property examples:

Metric: User classification
Tuple: <user, all users at W3C, May 1999>

Metric: Mime-type distribution
Tuple: <Web page, pages on W3C Web site, June 1999>

Metric: Growth of the Web
Tuple: <Web site, entire Web, June 1998-June1999>

By recognizing that this tuple exists for every Web metric, it is >not necessary to make a separate listing for static and dynamic versions of the same metric, or for metrics applied to different populations.

The metrics listed below are grouped according to element.
 
 

Web Metrics

Data Source
Sample data (see href="http://www.oclc.org/oclc/research/publications/review97/oneill/o>'neillar980213.htm">"A Methodology for Sampling the World Wide Web").

Metrics
Number of Web servers (See "Web Server Taxonomy" section of >terminology sheet for more details).

Number of Web sites

Number of unique Web sites (e.g., filter out Web sites located at >multiple IP addresses)

Number of Web pages

Number of Web collections

Number of bytes

Network traffic (e.g., bytes transferred, Web pages accessed, etc.)

Ratio of size of core to size of periphery

Percentage breakdown of protocols across the periphery
 
 

Web User >Metrics

Data Source
Survey data

Metrics
User classification (adult, child, professional user, casual user, etc.)

User access method (ISP, dial-up modem, wireless network, etc.)

User response rate and attrition rate

Data filtering imposed by user (i.e., which client filters have been activated by the user; see client side-filtering below)

Files transferred per user

Unique files transferred per user

Pages transferred per user

Unique pages transferred per user

Web sites visited per user

Unique Web sites visited per user

Reoccurrence rates for files, pages, and sites

Sessions per user per time period

Temporal length of sessions per user

Inter-session time per user (session to session time)

Path length of sessions per user

Stack distance per user

Inter-request time per user (request to request time)

Intra-request time per user (request to render time)

Temporal length of visit per site per user

Path length of visit per site per user

Ratio of explicit clicks to implicit clicks, per user per session

Ratio of embedded clicks to user-supplied clicks, per user per >session
 
 

Web Client >Metrics

Data Source
Log files

Metrics
Type of client (browser, robot, etc.)

Renderable mime-types

Java-enabled (yes, no)

Click-generation functionality (address window, favorites list, >history list, etc.)

HTML fluency (i.e., what is the latest version of HTML recognized by the client?)

Client-side filtering capability (Internet content ratings, >certificates, etc.)
 
 

Web Server >Metrics

Data Source
Log files

Metrics
Internet node identification (IP address and port)

Domain name (and aliases)

Other Internet nodes mapped to same domain name

HTTP node classification (inaccessible, redirection, accessible; >these classifications will be time-sensitive; see volatility metric below)

Top-level domain (e.g., .com, .edu, etc.)

Geographical location

Number of subsites (i.e., single Web site on server, or host site >with subsites (virtual hosting))

Server-side filtering (e.g., robots.txt, firewalls, etc.)

Number of files on server

Number of Web pages on server

Files/pages by traffic graph (e.g., % of files/pages account for % >of traffic)

Volatility level (summarizing the accessibility of the server during a given time period)

Ratio of explicit clicks to implicit clicks for server

Discussion:
I don't think that servers metrics should >include metrics that relate to content. Content metrics should be confined to >the Web resources (as discussed below). For example, the server metrics >list previously included a "modification of content history" metric - this >should be applied to the specific resource containing the content (e.g., a Web page).
 
 

Web Site >Metrics

Data Source
Sample data; log files

Metrics
Web site publisher

North American Industrial Classification System (NAICS) code for >publisher

Textual description of site's content

Content access scheme (free, pay-per-view, subscription, etc.)

Number of Web pages

Number and type of Web collections

Number of user Web page requests per time period

Number of search engines indexing the site

Number of pages served per time period

Percentage of site devoted to CGI/dynamic content

Bytes transferred per time period

Byte latency

Birth and modification history (major revisions of content - from >HTTP header?)

Cookie supplied (yes, no)

Depth (number of levels in site's internal link structure)
 
 

Web Page >Metrics

Data Source
Sample data; log files

Metrics
Aggregate size of constituent Web resources (in bytes)

Number and type of embedded non-text objects (images, video, >streaming data, applets, etc.)

Hyperlinks per page

Percentage breakdown of mime types in hyperlinks (e.g., html, jpg, >ps, etc.)

Percentage breakdown of protocols in hyperlinks (e.g., http, shttp, gopher, etc.)

Ratio of internal to external links on page

Textual description of page's content

Content access scheme (free, pay-per-view, subscription, etc.)

Birth and modification history (major revisions of content - from >HTTP header?)
 
 

Web Collection >Metrics

Data Source
Sample data; log files

Metrics
Type of collection (online journal, photo gallery, etc.)

Content access scheme (free, pay-per-view, subscription, etc.)

Number of Web pages in collection

Birth and modification history (major revisions of content - from >HTTP header?)