Re: Cleaning House from Maciej Stachowiak on 2007-05-06 (www-html@w3.org from May 2007)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 6 May 2007 05:38:50 -0700
To: Tina Holmboe <tina@greytower.co.uk>
Cc: James Graham <jg307@cam.ac.uk>, Anne van Kesteren <annevk@opera.com>, Philip & Le Khanh <Philip-and-LeKhanh@royal-tunbridge-wells.org>, www-html@w3.org, public-html@w3.org
Message-Id: <ECBF585B-1DC2-498D-9CD3-91F14FC10437@apple.com>

On May 6, 2007, at 5:17 AM, Tina Holmboe wrote:

> On Sun, May 06, 2007 at 01:08:19PM +0100, James Graham wrote:
>
>> "A question commonly asked of every lexicographer is "How do you  
>> choose
>> which words go into the Dictionary?"
>
>   [snip]
>
>> So they don't include everything but do include anything with
>> significant usage. It all seems rather sensible to me.
>
>   Indeed it does. But the important point is that there exist a
>   considered "protocol", if you will, for when and what to
>   include.
>
>   In the case of the WA1, things are included, or excluded, or
>   changed without rhyme or reason and without basis in actual,
>   real-life, examples or "proof"*.

Here's a study done by the WA1 editor:

http://code.google.com/webstats/2005-12/classes.html

Notice how the most commonly used classes in the wild (with a few  
exceptions) are related to WA1 elements or predefined class names.

So the criteria used for many of the changes were based on frequency  
of use, similar to the OED's inclusion criteria.

>   Add to it that markup languages /are/, by necessity, stricter
>   than natural languages, and the linguistic philosophies in
>   question does not apply.

The semantic meaning to be inferred from markup elements can't  
possible be that much stricter than natural languages, since whether  
a semantic use is conforming is based on the meaning of contained and  
surrounding natural language. Example:

"To detect that <em>the</em> is italicized for non-emphasis reasons,  
you need strong understanding of natural language. You would need to  
understand that I'm using italics to resolve use-mention ambiguity  
rather than to emphasize."

>  *
>   Without /apparent/ such, anyhow, since the WHATWG, oddly
>   enough for a group that consist of browser vendors, appear
>   not to read "wild" HTML ...

See large-scale study of HTML in the wild cited above (although I'd  
love to see a version that is even more quantitative and links to  
some samples, or better yet a fully reproducible experiment.)

Regards,
Maciej

Received on Sunday, 6 May 2007 12:39:51 UTC