Machine checkability (Was: Re: HTML Action Item 54 - ...draft text for HTML 5 spec to require producers/authors to include @alt on img elements.) from Philip Taylor on 2008-05-10 (public-html@w3.org from May 2008)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Sat, 10 May 2008 20:15:52 +0100
To: Justin James <j_james@mindspring.com>
CC: public-html@w3.org
Message-ID: <4825F468.8090908@cam.ac.uk>

Justin James wrote:
> Ian Hickson wrote:
>> I respectfully disagree. I think it's very important that we define
>> elements to have clear semantics and that we require that those
>> semantics not be misused.
> 
> How do you propose requiring that the semantics not be misused, if
> the spec cannot be 100% machine verified?

A human can check the non-machine-checkable requirements. An author who 
wants to write high quality HTML can post their code to a friend or a 
mailing list or a message board, and someone who knows the spec well can 
reply saying "you mustn't use <b> for these headings" and "you mustn't 
use an empty alt attribute on this button image" and point to the parts 
of the spec that require those things.

Hopefully that person will also explain *why* that's considered bad HTML 
- it'd be great if we could provide that justification for all the 
authoring requirements in the spec, though that should probably be in a 
separate non-normative document. And those justifications should be 
based on the real negative effects that people will experience - using 
<b> for headings means document outline algorithms will work poorly, 
using alt="" for non-decorative images means some users won't know what 
the image is representing, etc - so that resolving those issues in a 
document will make that document work better for users.

The only unique feature of machine verification (compared to human 
verification) is that it's deterministic - otherwise the differences are 
just in the levels of speed, cost, thoroughness (how many things it 
checks) and accuracy (how few things it intends to check but misses).

The useful output of any validation process (either machine-based or 
human-based) is a list of ways to improve your document, and is not a 
boolean tick/cross. That means determinism isn't critical: if you 
validate your document twice, and get two different lists of ways to 
improve your document, that's okay since you're still being told how to 
improve your document.

Since determinism isn't critical, the choice between machine vs human 
checking is simply a tradeoff of various properties: you can have a 
quick superficial machine check that guarantees finding all invalid 
syntax every time you run it, or you can have a slow resource-intensive 
human check that finds many more ways to improve your document but might 
miss some issues.

Some people will value the quality of their document highly enough to 
justify the cost of having a human check it, and so the spec's 
non-machine-checkable requirements will benefit those people. Most 
people won't; but most people (seemingly around 98%) don't value quality 
enough to bother with machine checking either, so there is no difference 
except in scale.

We just need enough people to care about producing high-quality HTML so 
that the benefit to them (and their users) of the thorough non-machine 
validation is enough to justify the cost of us developing those 
non-machine-checkable requirements. Given the total number of HTML 
authors, the tiny fraction that cares about quality is still a large 
number of people, so this seems worthwhile. The requirements aren't 
useless just because they will be almost universally ignored.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Saturday, 10 May 2008 19:16:42 UTC