W3C home > Mailing lists > Public > public-html@w3.org > August 2009

Re: [DRAFT] Heartbeat poll

From: Shelley Powers <shelleyp@burningbird.net>
Date: Sun, 02 Aug 2009 08:16:50 -0500
Message-ID: <4A7591C2.3060506@burningbird.net>
To: Maciej Stachowiak <mjs@apple.com>
CC: Julian Reschke <julian.reschke@gmx.de>, Ian Hickson <ian@hixie.ch>, Sam Ruby <rubys@intertwingly.net>, John Foliot <jfoliot@stanford.edu>, 'HTML WG' <public-html@w3.org>
Maciej Stachowiak wrote:
> On Aug 1, 2009, at 11:47 PM, Julian Reschke wrote:
>> Ian Hickson wrote:
>>> ...
>>>> Your sampling is flawed because it doesn't account for a 
>>>> significant number of web pages that are not accessible to the public.
>>> Pages that are not part of the Web do not need to use a standard 
>>> interoperable across the entire Web, they can use proprietary formats.
>> > ...
>> Sorry? I think this is something we need to discuss. Just because a 
>> web-based application only runs on an intranet doesn't mean it's 
>> irrelevant. It just means it is harder to collect data about it.
> I don't think intranets are irrelevant, but they do raise an 
> epistemological problem. People often claim that intranets have 
> content with substantially different characteristics than the public 
> Web, in particular respects. But in practice it is usually impossible 
> to test this kind of hypothesis. That means these kinds of claims are 
> not falsifiable and therefore not scientific.
> So we have three basic options: (1) ignore all data and make decisions 
> purely based on armchair reasoning;  or (2) by Occam's Razor, assume 
> intranet content is much like public Web content unless specifically 
> shown otherwise with concrete evidence; (3) ignore intranet content 
> except when we can gather concrete data about its unique 
> characteristics or special needs.
> I don't think #1 is the most rational of these choices.

But Maciej, you are ignoring data. I, and others, have pointed out, 
numerous times, that the use of HTML tables in the data collected was 
also incorrect. If you're making a conclusion about summary, only, you 
need to have a good collection of data that reflects good HTML table 
use, but bad summary use, and from what I can see, the data that's not 
been collected does not warrant such a conclusion.

In addition, I have tried to point out numerous discussions discovered 
in Google that demonstrate that the data in HTML tables in intranets, 
behind firewalls, could very well demonstrate good HTML table use AND 
goos summary use. It is somewhat anecdotal in nature, true, since it is 
culled from Google search engines. However, all of the data provided for 
the arguments against summary have been anecdotal in nature. None of it 
is derived from gathering data in controlled circumstances. It's all 
based on scraping a portion of web pages, with no real way of knowing 
how viable this scraping is when it comes to representing all uses of HTML.

You also don't take into account the fact that the web is both archival 
and current, which means that you can't differentiate between use of 
HTML table and summary in older, no longer maintained web pages, and 
pages that are actively being maintained. So the data is tainted because 
we can't really determine how people are using HTML tables or summary 

The data is tainted in so many ways, making it too easily vulnerable to 
subjective interpretation, that I'm really surprised that people who 
espouse a scientific methodology would continue to rely on it. Then you 
use a pejorative term such as "armchair", most likely to undermine the 
expertise of the people involved, which is also counter to typical 
scientific practice.

Reasoned arguments have been provided. I have not seen any of them refuted.
> Regards,
> Maciej

Received on Sunday, 2 August 2009 13:17:39 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:49 UTC