RE: page views

From: chris (chris@webcriteria.com)
Date: Fri, May 28 1999


From: "chris" <chris@webcriteria.com>
To: <www-wca@w3.org>
Date: Fri, 28 May 1999 11:41:24 -0700
Message-ID: <005501bea939$b0be8ae0$02a8a8c0@honker.gaggle>
Subject: RE: page views

Brian,

The following is a long description, but I can't
avoid some background.

I am not suprised to hear that you also struggled
with this problem; it took us several months of
hard thinking to arrive at a minimal solution. I
still don't think we have it perfect yet.

SUMMARY on depth of view

To summarize, we measure page accessibility as
the time to traverse the shortest path, where each
segment in the path has an associated time to
scan a link, click, and load the next view. The
shortest path is not based on number of segments.

SITE STRUCTURE

As far as overall structure, we view the site as
a directed graph (cyclic of course). The structure
is determined by spidering the site breadth-first,
and the graph is constructed by view in the same
breadth-first manner. Views may contain multiply
nested frames, so I think we adhere to your definition
of a "manifested resource" page view, i.e. all of
the elements that are demanded by an instance of a
URI realization are included in the view.

SAMPLE SIZE

We have the notion of a sample size, just like food
nutrition labels. Our sample size is measured in browse
time on a supersite. Browse time is measured by modeling
human browsing behavior and applying that model to the
site. We chose a knowledge worker model that scans for
material - the material is located at every page in
turn. This way, every page is both an end point, and
a hop along the way to another page. We had two PhDs
from human pyschology and web usage define the model
(they based it on GOMS). Scan time and page load times
(modeled like a real browser works) are combined to
arrive at a cost in time for each link to each view.

ACCESSIBILITY (REACHABILITY)

As the graph is built, we add each edge cost until the
browse time limit is reached (600 minutes right now).
Common views are coallesced so the graph has links that
share some views. In order to understand "accessibility"
of each page in the site, we find the shortest path
(in time!) from the host page to the page under test.
The shortest path in time has a better chance of being
the actual path than just the shortest number of segments.
(Note the shortest number of segments is likely to be
via a site map, although many site maps are huge and
cluttered, thus they are not the most travelled path -
log file analysis will support this claim).

We think defining a sample size is critical in order
to make comparisons between sites. Our early attempts
were based on number of URLS, etc., but this does not
relate very well to user experience. Some sites are
huge, and measuring the whole site is not feasible.
We did some early research on sizes of sites by trying
to spider them and it was very frustrating to site
owners and to us.

We find that with our model, 600 minutes of browse time
yields a variety of page views from 200 to 700 depending
on the nature of the site. From this, we can compute the
average time per view (roughly .8 to 3 minutes). We also
produce a histogram of accessibilities (times to reach
all of the pages in the sample size), which correlates
very well with user experience. Our PhDs in human computer
interaction conducted a formal test with real users in
task-oriented experiments; the results were correlated
on the same site with our measurements and we achieved
8 out of 10 matches. They are presenting our joint paper
at HFWEB99 (Human Factors on the Web) in conjunction with
NIST.

I appologize for going on so long here, but I am very
excited about this field. If there is anything that we
can do to help, please ask. I would like to volunteer
to conduct a benchmark using our tools that we could
post on a relavent site. We have a sample report on our
site at

http://www.webcriteria.com/product/sample_report.htm

There are also free benchmarks, but they require
registration. I could build a benchmark for a set of
sites that you guys already have data for and make
some comparisons.

I welcome comments on our methods. We also have a
white paper located at

http://www.webcriteria.com/our_tech/white_paper.htm

Cheers, Chris

> -----Original Message-----
> From: www-wca-request@w3.org 
> [mailto:www-wca-request@w3.org]On Behalf Of
> Lavoie,Brian
> Sent: Friday, May 28, 1999 10:44 AM
> To: 'www-wca@w3.org'
> Subject: FW: page views
> 
> 
> Chris,
> 
> I think your suggestion is an excellent one, and I will make a note to
> include this term in the next revision. Defining the 
> structure of a website
> is an interesting problem, and one that we have struggled 
> with in analyzing
> our web sample data. This problem came up for us when we were 
> trying to
> figure out a way to determine a webpage's "depth" in a site - 
> i.e., how many
> layers down in the site you had to go to reach it. Clearly, 
> the fact that
> there are often multiple ways to reach a page makes defining 
> this depth
> problematic. We decided to use a metric where we took the 
> number of segments
> in the shortest path to the page, but this was an ad hoc 
> approach, not based
> on any formal definition of the structure of a site. I would be very
> interested in any research you have done on defining the 
> structure of a
> website.
> 
> Regards,
> Brian
>