Re: Requirements for research (Was: Dropping <input usemap="">)

On 16 Aug 2007, at 04:40, Robert Burns wrote:

> A scientific approach would involve several things. It would be  
> conducted with a goal to retrieve unbiased data. That means giving  
> every HTML document an equal probability of selection. Right now,  
> you're conducting research based on entries in a Google cache. Its  
> biased toward pages that want googles attention. Those pages b  
> behind firewalls,, or on local derives are completely left out of  
> the research. I don't have any research on this, but I would expect  
> such pages to often pay more attention to details than the pages  
> fighting for Google's attention. It would be like looking through  
> the emails, passing through an email server and concluding that  
> most emails are about penis enlargement or counterfeit watches.

May I ask how you propose on getting a better data set than Google's  
cache? You're highly unlikely to get data  from behind firewalls or  
local drives.

> Genuine scientific statistical research also lays out methodology  
> and is reproducible.  From a scientific perspective, saying I  
> searched a cache that I have, that you can't search and I won't  
> even show you the code that produces that cache , would be the same  
> as me saying the following. "I have this 8-ball and when I ask it  
> if we should drop @usamap from |input| it tells me 'not likely'.  
> You may say that sure, 8-balls say that But the odd part is that it  
> says that every time [cue eerie music]." :-) The point though is  
> that it can't be reproducible at all if its all based on hidden  
> data and methods.

Again: how are you going to get a better data set?

- Geoffrey Sneddon

Received on Thursday, 16 August 2007 12:59:58 UTC