Paper on Changes in Web Client Access Patterns

From: Paul Barford (barford@cs.bu.edu)
Date: Tue, Dec 15 1998


Date: Tue, 15 Dec 1998 11:45:51 -0500 (EST)
From: Paul Barford <barford@cs.bu.edu>
To: www-wca@w3.org
cc: Azer Bestavros <best@cs.bu.edu>, Mark Crovella <crovella@cs.bu.edu>
Message-ID: <Pine.GSO.3.96.981215113717.1219B-100000@csb>
Subject: Paper on Changes in Web Client Access Patterns

Hello:

The technical report version of the paper "Changes in Web Client
Access Patterns: Characteristics and Caching Implications" is now
available from the BU technical reports home page:

http://www.cs.bu.edu/techreports/

It is listed under TR 98-023.  A modified version of this report has
been accepted for publication in an upcoming Special issue of the WWW
Journal.  The abstract follows below.  Questions and comments are
welcome.

PB

Title: Changes in Web Client Access Patterns: Characteristics and
Caching Implications
Authors: Paul Barford, Azer Bestavros, Mark Crovella, and Adam Bradley
Date: December 4, 1998

Abstract:

Understanding the nature of the workloads and system demands created
by users of the World Wide Web is crucial to properly designing and
provisioning Web services.  Previous measurements of Web client
workloads have been shown to exhibit a number of characteristic
features; however, it is not clear how those features may be changing
with time.  In this study we compare two measurements of Web client
workloads separated in time by three years, both captured from the
same computing facility at Boston University.  The older dataset,
obtained in 1995, is well-known in the research literature and has
been the basis for a wide variety of studies.  The newer dataset was
captured in 1998 and is comparable in size to the older dataset.  The
new dataset has the drawback that the collection of users measured may
no longer be representative of general Web users; however using it has
the advantage that many comparisons can be drawn more clearly than
would be possible using a new, different source of measurement.  Our
results fall into two categories.  First we compare the statistical
and distributional properties of Web requests across the two datasets.
This serves to reinforce and deepen our understanding of the
characteristic statistical properties of Web client requests.  We find
that the kinds of distributions that best describe document sizes have
not changed between 1995 and 1998, although specific values of the
distributional parameters are different.  Second, we explore the
question of how the observed differences in the properties of Web
client requests, particularly the popularity and temporal locality
properties, affect the potential for Web file caching in the network.
We find that for the computing facility represented by our traces
between 1995 and 1998, (1) the benefits of using size-based caching
policies have diminished; and (2) the potential for caching requested
files in the network has declined.