Re: Getting beyond the ping pong match (was RE: Cleaning House) from Bjoern Hoehrmann on 2007-05-07 (www-html@w3.org from May 2007)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 07 May 2007 11:40:50 +0200
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Cc: www-html@w3.org
Message-ID: <3lbt3358v9olmq4qqfpcmarfqtbgq8gtth@hive.bjoern.hoehrmann.de>

* Lachlan Hunt wrote:
>I did not claim it was implausible.  I claimed it was hypothetical 
>because it was presented without any evidence to back it up.  There have 
>since been some examples presented, which is good, but still no 
>explanation of what problems are caused.

If someone says "This example is plausible" and someone else responds
"No, the example is completely hypothetical", then there is no doubt in
my mind that the responder wants to create the impression that it is
disputed whether the example is indeed plausible.

>But, here's some evidence to support the definition of class="copyright" 
>in the spec.  The following sites all use the value in a relatively 
>compatible way.  Although some use it on elements other than those 
>allowed by the spec, it's the value that's important.  However, I agree 
>that's a bug in the spec.

This is not useful information. There is no doubt that there are plenty
of pages where you can draw certain conclusions from markup about parts
of the document. Useful information would be:

  * What is the pain to be solved.
  * Why the pain needs to be solved.
  * How to solve the pain.
  * Why that would be the best solution.

Without clear answers to these questions, point-counterpoint discussions
regarding all four questions are not likely to produce useful results.
For all that I can see, whatever the pain may be, it's clear that the
microformats community can handle it at least as well for the moment,
and the HTML WG has many more important and better understood problems
to solve.

>It is clearly stated in the introduction that the study used "a sample 
>of slightly over a billion documents".  Hixie has also done subsequent, 
>though unpublished, studies with much greater sample sizes.

The sample you pick needs to be useful to answer questions of interest,
making the sample very large does not make it useful. If you want to
know how many chinese people there are, asking a billion people whether
they are chinese would only tell you there are at least n such people.
A useful sample would be representative of the world population, then
it would not matter much whether you ask a thousand or a billion people.

We don't know, for example, how much search engine spam is included in
the Google sample. Should there be a lot because there is a lot of spam
on the web? Or should there be none because studying spam is not inter-
esting? Or how many Google search results? Many, because very many of
them end up in browser caches, or just one because the markup is always
the same? Either approach is reasonable, but the choice has significant
impact on the result.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Monday, 7 May 2007 09:41:00 UTC