W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: Requirements for research (Was: Dropping <input usemap="">)

From: Geoffrey Sneddon <foolistbar@googlemail.com>
Date: Thu, 16 Aug 2007 13:59:46 +0100
Message-Id: <F1BDAF5B-970D-46A1-823F-5244D0159B6C@googlemail.com>
Cc: public-html@w3.org
To: Robert Burns <rob@robburns.com>

On 16 Aug 2007, at 04:40, Robert Burns wrote:

> A scientific approach would involve several things. It would be  
> conducted with a goal to retrieve unbiased data. That means giving  
> every HTML document an equal probability of selection. Right now,  
> you're conducting research based on entries in a Google cache. Its  
> biased toward pages that want googles attention. Those pages b  
> behind firewalls,, or on local derives are completely left out of  
> the research. I don't have any research on this, but I would expect  
> such pages to often pay more attention to details than the pages  
> fighting for Google's attention. It would be like looking through  
> the emails, passing through an email server and concluding that  
> most emails are about penis enlargement or counterfeit watches.

May I ask how you propose on getting a better data set than Google's  
cache? You're highly unlikely to get data  from behind firewalls or  
local drives.

> Genuine scientific statistical research also lays out methodology  
> and is reproducible.  From a scientific perspective, saying I  
> searched a cache that I have, that you can't search and I won't  
> even show you the code that produces that cache , would be the same  
> as me saying the following. "I have this 8-ball and when I ask it  
> if we should drop @usamap from |input| it tells me 'not likely'.  
> You may say that sure, 8-balls say that But the odd part is that it  
> says that every time [cue eerie music]." :-) The point though is  
> that it can't be reproducible at all if its all based on hidden  
> data and methods.

Again: how are you going to get a better data set?

- Geoffrey Sneddon
Received on Thursday, 16 August 2007 12:59:58 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:25 UTC