Re: survey of top web sites

+www-archive -www-public

At 10:43 PM 4/29/2007, Karl wrote (in message at 
http://lists.w3.org/Archives/Public/public-html/2007Apr/1704.html) :

>Doing a survey is tricky but very interesting, we need to clearly
>define the methodology so that we know how to interpret the results.
>Some previous results gave only the compiled results which makes it
>difficult to interpret.

Hi Karl,

As I mentioned 
(http://lists.w3.org/Archives/Public/public-html/2007Apr/1544.html), 
Sander and I began having possibly related discussions of methodology 
somewhat in parallel but offlist,  since there seem to be two 
differing ideas about why one might want to do such sampling of web sites.

I had suggested a slightly different methodology than what you 
suggest, thinking it may or may not prove to be of interest. At the 
end of this message are some of my comments on such a methodology:

My idea was to form a stratified sample of web pages at each of 
several points of the spectrum of web pages: a) top 200, b) Alexis 
500, c) random, and d) "weird" or fringe cases that would be 
assembled by hand. And then to cross that with a variable 
representing instances of either standards or browsers

Your approach (to what may ultimately be a different problem) 
considers a number of things I didn't. Though the browser sniffing 
stuff you mention is something I was thinking about. I don't know if 
one can robotically parse a document so that it looks like it would 
in Opera, FF, Safari, IE, etc. or not. I was rather naively assuming 
a fleet of grad students would fill out that part of the experimental 
design by hand. The other thing that is relevant to the discussion I 
think is the issue of the many different kinds of web content (sorta 
like you mention) -- blogs, news feeds, ordinary web pages, wikis, 
HTML fragments, print, email, etc. That could get complicated fast it seems.

Also germane to the discussion may be some of the stuff that I think 
the folks interested in usability studies might be concerned with. 
See for example 
http://lists.w3.org/Archives/Public/public-html/2007Apr/0962.html, in 
which the classes of pages are further classified into types by 
author types (e.g. search engines v corporate etc.)

It may make some sort of sense to convene a conversation unioning 
both the survey and the usability folks, since some of the 
methodological concerns may in fact overlap. Just an idea -- thinking out loud.

David
--------<quote>---------------------
The other two folks I mentioned [zdenko and sean, cc-ed above] are 
involved in the business of sampling the 200 sites, so it might be 
best to get them involved as well. I didn't sign up for this 
particular task since standards effectiveness is a more tangential 
concern of mine. (though I am really glad someone is looking at it.)

I would tend to think the methodology oughta look something like this

        method of evaluation
       standards      browsers
       S1 S2 S3      B1 B2 B3 B4
p p1
a p2
g p3
e p4
s p5

where both standards and browsers are used as repeated measures for pages.
Pages are randomly chosen within categories C={Top200/50, 
Alexis500/50, random50, weird50)

One samples 50 of each category and then one has a classical mixed 
model analysis of variance with repeated measures and only one random 
effects variable. Dependent variable can be either discrete (+ or -) 
or continuous. Doesn't much matter last time I studied statistics. 
Then we have a somewhat striated sample that can be compared across 
sampling strategies.

But the idea is to sample as divergent a group of pages as possible.

To get the random 50 -- I'm not sure what the best methodology is -- 
I suggested StumbleOn (but it has its own idiosyncracies) -- I 
remember some search engines have a "find a random page" feature so 
one might be able to track down how they do that. Someone on our 
group must know.

To get a weird 50 -- I have a couple of ecclectic collections 
<http://srufaculty.sru.edu/david.dailey/javascript/various_cool_links.htm>http://srufaculty.sru.edu/david.dailey/javascript/various_cool_links.htm 
is one
<http://srufaculty.sru.edu/david.dailey/javascript/JavaScriptTasks.htm>http://srufaculty.sru.edu/david.dailey/javascript/JavaScriptTasks.htm 
is another

Both are peculiar in the sense that they attempt to probe the 
boundaries of what is possible with web technologies -- some are 
heavily Flash some are heavily JavaScript -- many don't work across 
browsers and in many cases I don't know why. Too busy to track it all 
down. (some of my pages are several years old and used to work better 
than they do now). My emphasis has been far less on standards than on 
what works across browsers -- the standards and browsers generally 
seem to have so little to do with one another.

A proper methodology for weird sites: have a group of volunteers 
explain what they are looking for (a collection of fringe cases) and 
let others contribute to a list. I don't know. A simpler methodology: 
have a group of volunteers just sit and come up with a list of sites 
believed to push the frontier.
------------</quote>-------------------- 

Received on Monday, 30 April 2007 15:41:48 UTC