Re: survey of top web sites from David Dailey on 2007-04-30 (www-archive@w3.org from April 2007)

From: David Dailey <david.dailey@sru.edu>
Date: Mon, 30 Apr 2007 11:41:57 -0400
To: Karl Dubost <karl@w3.org>
Cc: connolly@w3.org,www-archive@w3.org,st@isoc.nl,zdenko@ardi.si, sean@elementary-group.com
Message-Id: <6.2.5.6.1.20070430105051.031d1f48@sru.edu>

+www-archive -www-public

At 10:43 PM 4/29/2007, Karl wrote (in message at
http://lists.w3.org/Archives/Public/public-html/2007Apr/1704.html) :

>Doing a survey is tricky but very interesting, we need to clearly
>define the methodology so that we know how to interpret the results.
>Some previous results gave only the compiled results which makes it
>difficult to interpret.

Hi Karl,

As I mentioned
(http://lists.w3.org/Archives/Public/public-html/2007Apr/1544.html),
Sander and I began having possibly related discussions of methodology
somewhat in parallel but offlist, since there seem to be two
differing ideas about why one might want to do such sampling of web sites.

I had suggested a slightly different methodology than what you
suggest, thinking it may or may not prove to be of interest. At the
end of this message are some of my comments on such a methodology:

My idea was to form a stratified sample of web pages at each of
several points of the spectrum of web pages: a) top 200, b) Alexis
500, c) random, and d) "weird" or fringe cases that would be
assembled by hand. And then to cross that with a variable
representing instances of either standards or browsers

Your approach (to what may ultimately be a different problem)
considers a number of things I didn't. Though the browser sniffing
stuff you mention is something I was thinking about. I don't know if
one can robotically parse a document so that it looks like it would
in Opera, FF, Safari, IE, etc. or not. I was rather naively assuming
a fleet of grad students would fill out that part of the experimental
design by hand. The other thing that is relevant to the discussion I
think is the issue of the many different kinds of web content (sorta
like you mention) -- blogs, news feeds, ordinary web pages, wikis,
HTML fragments, print, email, etc. That could get complicated fast it seems.

Also germane to the discussion may be some of the stuff that I think
the folks interested in usability studies might be concerned with.
See for example
http://lists.w3.org/Archives/Public/public-html/2007Apr/0962.html, in
which the classes of pages are further classified into types by
author types (e.g. search engines v corporate etc.)

It may make some sort of sense to convene a conversation unioning
both the survey and the usability folks, since some of the
methodological concerns may in fact overlap. Just an idea -- thinking out loud.

David
--------<quote>---------------------
The other two folks I mentioned [zdenko and sean, cc-ed above] are
involved in the business of sampling the 200 sites, so it might be
best to get them involved as well. I didn't sign up for this
particular task since standards effectiveness is a more tangential
concern of mine. (though I am really glad someone is looking at it.)

I would tend to think the methodology oughta look something like this

method of evaluation
standards browsers
S1 S2 S3 B1 B2 B3 B4
p p1
a p2
g p3
e p4
s p5

where both standards and browsers are used as repeated measures for pages.
Pages are randomly chosen within categories C={Top200/50,
Alexis500/50, random50, weird50)

One samples 50 of each category and then one has a classical mixed
model analysis of variance with repeated measures and only one random
effects variable. Dependent variable can be either discrete (+ or -)
or continuous. Doesn't much matter last time I studied statistics.
Then we have a somewhat striated sample that can be compared across
sampling strategies.

But the idea is to sample as divergent a group of pages as possible.

To get the random 50 -- I'm not sure what the best methodology is --
I suggested StumbleOn (but it has its own idiosyncracies) -- I
remember some search engines have a "find a random page" feature so
one might be able to track down how they do that. Someone on our
group must know.

To get a weird 50 -- I have a couple of ecclectic collections
<http://srufaculty.sru.edu/david.dailey/javascript/various_cool_links.htm>http://srufaculty.sru.edu/david.dailey/javascript/various_cool_links.htm
is one
<http://srufaculty.sru.edu/david.dailey/javascript/JavaScriptTasks.htm>http://srufaculty.sru.edu/david.dailey/javascript/JavaScriptTasks.htm
is another

Both are peculiar in the sense that they attempt to probe the
boundaries of what is possible with web technologies -- some are
heavily Flash some are heavily JavaScript -- many don't work across
browsers and in many cases I don't know why. Too busy to track it all
down. (some of my pages are several years old and used to work better
than they do now). My emphasis has been far less on standards than on
what works across browsers -- the standards and browsers generally
seem to have so little to do with one another.

A proper methodology for weird sites: have a group of volunteers
explain what they are looking for (a collection of fringe cases) and
let others contribute to a list. I don't know. A simpler methodology:
have a group of volunteers just sit and come up with a list of sites
believed to push the frontier.
------------</quote>--------------------

Received on Monday, 30 April 2007 15:41:48 UTC