RE: HTML Action Item 54 - ...draft text for HTML 5 spec to require producers/authors to include @alt on img elements. from Justin James on 2008-05-12 (public-html@w3.org from May 2008)

From: Justin James <j_james@mindspring.com>
Date: Mon, 12 May 2008 01:27:36 -0400
To: "'Ian Hickson'" <ian@hixie.ch>
Cc: <public-html@w3.org>
Message-ID: <03ed01c8b3f0$e34545f0$a9cfd1d0$@com>
Except from Ian Hickson:

> Some huge number of Web authors -- in the 90% range, according to some 
> very wide studies I've done in the past few years -- do not write even 
> syntactically valid documents. The machine-checkability of requirements 
> has absolutely no effect on these authors. On the other hand, there are 

Imagine if, say, SMTP MTAs had the same level of spec adherence as HTML
authors? Email would be hopelessly broken and never get to where it needed
to go. The simple truth is, the HTML spec is too difficult to adhere to,
both for hand authors and for tool creators, and even good tools are easily
misused to generate bad HTML. My proposal takes HTML and strips it down to a
handful of tags, and users learn roles as needed, as opposed to figuring out
that the default behavior of a browser when encountering a particular tag is
to make the desired style. Strip the default behavior from the tags, and now
people have to learn the spec, or use good tools.

Simply put, the existing tools (no offense intended to anyone on this list,
if they are involved in such projects) are pretty bad overall. Yet, the
majority of people produce HTML have to use these tools because they are not
technical users. The problem isn't that the tool users are ignoring the
tool's warning and writing bad code. The problem is garbage tools like FCK
Editor that still use <font>, and then these tools get used in every piece
of blog and CMS software on the planet. So one bad decision by one or two
developers becomes literally tens of millions of people generating bad HTML
code.

The solution has to address the following points to be a success:

* Make it impossible (or extraordinarily difficult) for bad tools to
generate HTML that is sensibly rendered by browsers adhering to the new spec
*if they claim to use the new spec*

* Make code that renders properly in older browsers

* Make it possible for the HTML code to be machine verified to fully (nor
nearly fully) meet the spec; for example, the spec should say something
like, "any element with a role of 'paragraph' much contain textual content."
That would ensure that something marked as a paragraph is not simply being
used as a spacer. HTML tools would then be able to check for this, and in
some cases, automatically correct (for example, convert empty <div> to have
a role of "spacer" or "decorative"), and in other cases, request that the
user provide guidance ("is this image being used as a decoration or as piece
of content?")

* Fully divorce content, style, and role, and make all three mandatory
(except when the role explicitly allows for no styling or no content)

* Make the accessibility pieces role dependent and mandatory

> All of which is to say, whether something is machine-checkable or not 
> isn't a hugely relevant factor in whether we should require something. It 
> certainly isn't the only factor, or even the deciding factor, for how far 
> authors will go to make conforming documents (there are people on both 
> sides of the line).

I disagree, in fact, I disagree for the exact same reason. Without it being
possible for a machine check, the vast majority of the people generating
HTML will never be able to do it. Furthermore, the more difficult it is to
write an authoring tool that complies with the spec, the less likely it is
that tools that generate valid HTML will reach the content creators.

The fact that an overwhelming majority of people can't even get the DOCTYPE
right for crying out loud should be a giant red flag screaming that our spec
stinks.

> There isn't a way to machine-check whether what someone has written makes 
> any sense.

Someone generating valid, but incomprehensible content is not our problem,
either. They need to take that up with the English (or whatever language
they're writing) working group. :)

> What if the textual content isn't a paragraph? Say, because it's a 
> heading?
>
>    <div role="paragraph">Introduction to bee-keeping</div>
>
> Why is that machine-verifiable, when this isn't?:
>
>   <p>Introduction to bee-keeping</p>

In your example, they are both machine verifiable. But there are a bazillion
examples when the current system falls apart and my proposal is much more
robust.

The difference is, my proposal takes away the styling advantages of using
the wrong tag. It is quite common to see <h1> where the goal was really
<span style="font-size: large; font-weight: bold;">. Or <em>USS
Constitution</em> when semantically that is wholly incorrect. By taking away
the default stylings, and separating the role from the styling, you stop
seeing this.

J.Ja
Received on Monday, 12 May 2008 05:28:27 UTC