[text] updated draft, 20110430, of clarification on alt validation

Dear All,

(jump down to "UPDATED DRAFT 20110430" if you want to go straight to the draft)

This version of our draft clarification mail on 
alt validation incorporates updates on:
- meta name=generator, from Leif, also 
incorporating contributions from Benjamin;
- figcaption, from Judy, also adjusting for comments from Benjamin.

There are still questions pending within the 
draft below, for Rich, John, and Steve, though 
comments and replies are welcome from all.
Questions include whether we have fully responded 
to items where clarification was sought, or where 
it appears to be needed, in the co-chairs' 
decision that was provided by Maciej:
I encourage people to look through Maciej's mail as well as the draft below.

Note also that, per Sam's mail at
...in response to...
...that Sam is advising us to add formal change 
proposals to our clarification mails. In this 
Monday's meeting, I would first like to see how 
close we're getting to consensus on this 
composite clarification mail. Also let's discuss 
the status of any existing text that we can use 
for change proopals, and/or what needs to be 
developed and who is able to help with that.

[UPDATED DRAFT 20110430]

Dear HTML WG Co-Chairs,

With regard to your decisions, as described in Maciej's mail:

...which discussed the following issues with 
regard to validity requirements for alt:

>There is a basic disagreement in the group on the validity
>requirements for alt.  The result was two issues, six change
>proposals, and a straw poll for objections:

...and which arrived at the following six conclusions:

>    * The presence of aria-labelledby does not make missing alt conforming.
>    * The presence of role=presentation does not make missing alt conforming.
>    * The presence of <meta name=generator> makes missing alt conforming.
>    * Use of private communications does not, in 
> itself, make missing alt conforming.
>    * The presence of title makes missing alt conforming.
>    * The presence of figcaption makes missing alt conforming.

...we provide the following clarifications on the 
four decisions among these that we find problematic.

[@@We also provide change proposals for each of 
these items... (will add once available.)]

Please let us know of additional clarification 
questions on the following information as needed.

== On the Co-Chairs' decision on role=presentation ==

>* The presence of role=presentation does not make missing alt conforming.

The default semantics for and image with alt="" 
is role="presentation". The accessibility API 
mapping is such that, when alt="", the <img> 
element is removed from the accessibility API 
tree to improve assistive technology performance 
as well as browser performance in that the 
browser is not maintaining accessible objects for 
these presentational elements. Applications, such 
as those from IBM, have used role="presentation" 
to remove these objects from the accessibility 
tree as HTML4 does not have the same mapping as 
we have specified for HTML 5. So, not allowing 
alt="" and role="presentation" to be used 
interchangeably violates consistency with the 
default HTML 5 semantics and is inconsistent with the ARIA specification.

[the text above is from Rich and John as indicated in Rich's email
http://lists.w3.org/Archives/Public/public-html-a11y/2011Apr/0322.html ]

[note Leif's subsequent question, to which I 
can't find a reply; Rich or John can you 
elaborate your answer to address Leif's 
questions, or indicate your reply if I've missed it?
http://lists.w3.org/Archives/Public/public-html-a11y/2011Apr/0330.html ]

[Rich and John, can you look back at Maciej's 
decision on role=presentation and check that you 
have addressed all of the relevant arguments in 
that section, or add to your reply if needed?
http://lists.w3.org/Archives/Public/public-html/2011Apr/0451.html ]

== On the Co-Chairs' decision on meta name=generator ==

     * The presence of <meta name=generator> makes missing alt

[updated text from Leif, with some points from Benjamin]

In regards to the generator decision we would first of all comment 5
kinds of evidence which the chairs suggested as reasons to reopen the

    Link text is often visually hidden because it is [supposed to be]
located inside the @alt of an img element. [1] For link text, then
HTML5 creates the reasonable expectation that [1] "markup generators
should examine the link target to determine the title of the target, or
the URL of the target, and use information obtained in this manner as
the alternative text".  If generator vendors stop expecting from
themselves that their tools are able to provide proper link text just
because the link text is kept inside an image, and instead point to
their "bad users" as justification for why they need a generator
exception, then we believe there is evidence to suggest that the
exception does a disservice for accessibility. In that regard, since
the example images provided as well as most other images at Wikipedia
are image links, we interpret MediaWiki developer Aryeh Gregor's
presentation [3] (see follow-up thread too) of MediaWiki bug number
368, as a strong hint that the generator exception is likely to be used
as justification for settling with less than the state-of-the-art with
regard to image accessibility.
     PS: If the gist of the question for evidence that omission of @alt
increases because of the generator exception is that no one will care
do do so, then why do we need this exception?

    [3] http://lists.w3.org/Archives/Public/public-html/2011Apr/0459

    As the HTML5 validation performed by validator.nu and
validator.w3.org currently treat even pages without the generator
string as if they did include generator string (namely: omitted @alt
always considered conforming), it is for the time being difficult to
look for evidence that vendors have let the generator exception
interfere with proper inclusion of alt. However, according to the MAMA
project, 27% of web pages include the generator string. [7] Hence, both
tool vendors and tools users would be certain to get lots of gotcha
experiences if the generator string were to trick them into thinking
that their pages have been fullly validated, when they are not. E.g.
authors that take tool output (e.g. from HTML Tidy, which do have a
generator string) as basis for hand coding, would often forget to
remove the generator string. Copy-paste of code in general would lead
to the same result, especially for novice authors. (More documentation
of generator strings in tools: [5][6][7]) We know www-validator@w3.org
that nothing frustrates authors more than validator messages that are
based on cryptic rules: Just the last 4 days there were 2 questions
relating to the HTML4 validator's pedantic, SGML-based need for end tag
escaping inside the script element.[8][9] In the case of the script
element, with its 'stricter than reality' rules, then it doesn't affect
users. But in the case the effect of the generator string, not only
will it surprise and frustrate authors/vendors, but more importantly:
it will also frustrate users.

    [4] http://lists.w3.org/Archives/Public/public-html/2011Apr/0711
    [5] The list in the original change proposal from Laura Carslon
    [6] http://lists.w3.org/Archives/Public/public-html/2011Apr/0474
    [7] http://lists.w3.org/Archives/Public/public-html/2011Apr/0580
    [8] http://www.w3.org/mid/2334982876.988140239@xn--mlform-iua.no

    (This section is really a sub issue of 1.)
    A WHATwg blog post from 2007 (before ARIA was introduced in HTML5)
about why alt can be omitted, can testify to the fact that the @alt
issue has been a contested issue in the HTMLwg. [10] Four years later,
validator.nu (http://validator.nu/?doc=http%3A%2F%2Fmat-t.com) and
(http://validator.w3.org/check?uri=http%3A%2F%2Fmat-t.com) do still not
check for omitted/included @alt before they issue their Valid stamps.
Thus there is a chance that "weak souls" has gotten the impression that
HTML5 has laxed rules for alternative text.
    Has this lead to less use of @alt in HTML5 pages compared with HTML4
pages? A manual check of 60 first front pages listed in the HTML5
gallery per 28th of April 2011, [11] gave as result that 12 were
lacking @alt. [Reference 12 to 24]. According to the MAMA project, 21%
of img elements in the wild have @alt omitted. [11b] That 12 of 60
pages have instances of omitted @alt is proably roughly the same
percentage. Nevertheless, these example pages are not developed  by
"weak souls" or by "generators" but by developers that wants their work
to be listed as examples of HTML5. Some of the worst examples: One [12]
has 20 img elements but not a single @alt. Ref [19] has 9 img elements
and not a single alt. Ref [20] has 8 (of 9) images with omitted alt.
Another one [19] has @alt omitted in 8 of 23 img-s. Another: 9 of 9
lacks @alt. [23]
    A check of notable pages outside the HTML5 gallery: the
HTML5boilerplate.com [25] has no alt on its HTML5 badge. Ditto for
html5readiness.com [26]. Two HTML5 tutorials which forgets
@alt.[27][28] And Google's slides.html5rocks.com omits @alt on 9 out of
23 iamges (without using whether <figure> or ARIA features). [29]
    Conclusions? These are hand authored pages - at least they are made
by web professionals. According to the thinking behind the generator
exception, generator made pages will show more lack of alt than the
hand made ones. We think it is worrying that thus many HTML5 example
pages show these deficits with regard to @alt.

[10] http://blog.whatwg.org/omit-alt
[11] http://html5gallery.com/page/2/
[11b] http://dev.opera.com/articles/view/mama-images-elements-and-formats/#img
[12] http://mat-t.com
[13] http://www.tweetoon.co.uk
[14] http://www.aviary.com/html5builder
[15] http://www.dielinke-europa.eu
[16] http://blackbull.in
[17] http://wantist.com
[18] http://ffpbooth.com
[19] http://www.kubimedia.com
[20] http://jeffaquino.com
[21] http://www.jswidget.com
[22] http://www.dooity.com
[23] http://www.pelanidea.com
[24] http://onenyne.com
[25] http://html5readiness.com
[26] http://HTML5boilerplate.com
[27] http://tutorialzine.com/2010/02/html5-css3-website-template
[29] http://slides.html5rocks.com

4. EVIDENCE: "A demonstrated trend towards more authoring tools fully
supporting ATAG2, including the requirement to prompt for textual
equivalents for images."
    We would like to point to the fact that it is precisely generators
(including WYSIWYG editors) that prompt for textual equivalents for
images - hand authoring tools don't as then they wouldn't be pure hand
authoring tools, would they? The HTMLwg was recently presented with a
short, qualitative evaluation of some tools with regard to prompting
and which showed that many, even e-mail programs, do prompt for alt
text. [30]
    Interestingly, even plain text editors have started to help authors
offering @alt text: if you drop an image into a HTML file edited with
the mode sensitive TextMate.app for Mac OS X, it will auto-create an
img element for you where it uses the file name (minus suffix) as @alt
text. The @alt text selected and highlighted so you can change it
immediately if you see need to.
   To use file names as @alt text is a common pattern of conversion
tools such as hypermail 2.2.0+W3C-0.50 [30] and  TextEdit.app word
processor bundled with Mac OS X (whenever it saves a page as a
webarchive file). TextEdit.app uses the 'Cocoa HTML Writer' generator.
   Thus prompting is not the only important thing, it is also important
to simply make the author aware of how the program operates.
   We would turn the evidence asked for on the head: where is the trend
that shows that pages with the generator string have especially low alt
text quality?

    [30] http://lists.w3.org/Archives/Public/public-html/2011Apr/0580

5. EVIDENCE: HTML e-mail doesn't justify the generator exception
    The Decision placed the private e-mail exception under the generator
umbrella, thereby creating an exception that in teory is many times
more broad than private e-mail exception. We will therefore look at
whether HTML e-mail can justify the generator exception.
    ON THE ONE SIDE: W3 has had workshop on HTML in e-mail where
accessibility in general and alt text in particular, was one of the
issues dealt with. [32] Safari allows to send web pages as e-mail. [32]
And Outlook 2007 allows you to install a HTML-validator. The e-mail
standards project focuses on (mainly) CSS support in e-mail. [34] All
this hint that standards compliant HTML is *possible* and wanted. (We
do in particular want to note that Thunderbird's HTML editor, which
prompts for @alt text, will make use of the @alt text of an image in
the plain text version of the message that it is able to create and
sent int parallel. Thus it has a practical value beyond HTML-only
accessibility.) According to some advertisement 
experts, April 2011, @alt text is
"extremely important for HTML emails, because some email clients block
images". [35]

    BUT ON THE OTHER SIDE: We have studied a few HTML e-mail messages,
mainly those sent to public-html, and found the following: Apple Mail
does not include any DOCTYPE in its (native) HTML e-mail messages. (It
seems to often not include alt text in images either.) We don't know if
this means that it therefore renders web pages in quirks mode. Outlook,
in contrast, tends to send out very long, non-standard DOCTYPE strings.
We don't know if this means that it renders such pages in quirks-mode.
Lotus Notes 2010 does not include any DOCTYPE. We don't know if Lotus
notes therefor renders its own messages in quirks mode. A message we
sent from Thunderbird turned out to use the HTML4 Transitional DOCTYPE,
and we did not spot any options for choosing another DOCTYPE. We have
not checked whether this causes quirks-mode rendering in Thunderbird. A
Welcome letter sent from Google AdWords contains no DOCTYPE - the only
head element stuff it has is a meta charset element.
    OBSERVATIONS: E-mail messages in general are very rich on "generator
messages" in its MIME headers - either in the form of a X-Mailer:
header or a User-Agent: header, plus various other "stamps". In
contrast, we did not locate the meta@name=generator element in the HTML
messages we looked at. We also note that e-mail messages take their
title from the Subject: header and not from the <title> element.
Whether HTML standards fully matters for HTML e-mail is disputed. [36]
    CONCLUSIONS: (1) There are enough generator signals in e-mail, even
without the generator string. (2) The role of the DOCTYPE and the head
elements - HTML standards - in e-mail is a bit unclear. But in a
situation where e-mail messages tend to not include any (no-quirks)
DOCTYPE or any other head element elements, it is irrelevant to justify
the generator exception by pointing to HTML e-mail. After all, there
are many much more important HTML features to implement in HTML e-mail
before the generator string? (3) At the same, @alt text is so important
in e-mail that it is pointless to even create the impression that it

[31] http://www.w3.org/2007/05/html-mail/minutes
[31] http://www.w3.org/2007/05/html-mail/minutes#item05
[32] http://lists.w3.org/Archives/Public/public-html-mail/2007Mar/0000
[33] http://msdn.microsoft.com/en-us/library/aa338200.aspx
[34] http://www.email-standards.org/clients/
[35] http://www.ctiadvertising.com/2011/04/08/alt-text-in-email/
[36] http://www.w3.org/2007/05/html-mail/html-rendering-email

    IN ADDITION to the evidence which the chairs asked for, we will
    also make the following points about why the generator
    exception should be reexamined:

    The meta@name=generator exception has a direct parallel in the
generator exception for WYSIWYG editors present in HTML5 in the January
2008 draft, [1] and which Karl Dubost described as a form of
versioning. [2][3] 'Real' versioning, in form of strict/transitional
modes etc, is banned from HTML5 - at the latest in a Decision in
December 2010. [4] One (so far hypothetical) problem is that the
generator exception could cause vendors to treat images differently
when they operate with the generator flag present.
    Another effect could be that, just as authors continues to be
confused by the effect of 'strict' and 'transitional', a generator
exception would create confusion as well. Authors expect validators to
act in a transparent manner. If the validator silently passes over
missing @alt in generated code, then that will trick authors who are
used to errors raised over their hand coding into thinking they haven't
missed an @alt. The HTML5 validator(s) may also be embedded in
authoring tools, resulting in confusing error display differences
depending on the validation "mode". (Also see '2. EVIDENCE: LIST OF
    HTML5 gives as justification for forbidding the transitional/quirks
DOCTYPEs that they: [4a] "are known to cause especially subtle or
serious problems" which are "largely undocumented". And we find that
the effect of @alt omission is largely undocumented, with unpredictable
consequences as well: AT/UAs may try to repair, but in hard to
calculate and unhelpful ways.

[1] http://www.w3.org/TR/2008/WD-html5-20080122/#wysiwyg
[1] http://lists.w3.org/Archives/Public/public-html/2007Aug/0290
[2] http://www.w3.org/QA/2007/05/html_and_version_mechanisms
[4] http://lists.w3.org/Archives/Public/public-html/2010Dec/0135
[4a] http://dev.w3.org/html5/spec/introduction#syntax-errors

    In 2007, the HTML5 editor described the WYSIWYG generator exception
which the HTML5 draft then had, as a "solution for handling ... two
tiers of document quality". [5] The 2007 WYSIWYG generator stamp
permitted you to use <font> and @style. But if WYSIWYG editors is a
diverse group, were not all of them "need" to use <font>, the
encompassing group of markup generators is an even more diverse group,
ranging from simple converters to full-fledged authoring tools and CMS
applications with very diverse options with regard to the opportunities
they provide authors with when it comes to ensuring good @alt text.
What is important to consider, however, is that regardless of the
authoring tool's features, good markup *always* requires an author that
is familiar with the possibilities and short comings of the authoring
tool. The HTML5 editor has never presented evidence to suggest that the
tool vendors *or* tool users *want* that the generator strings to
"free" them from @alt validation. Instead we have evidence which shows
that at least one generator vendor claims that precisely the fact that
their tool is a HTML generator leads to code which is "clean and
standards compliant all the time". [5a] Further, we do also lack
evidence to suggest that generators which output the generator string
have a particular need for such a @alt omission permission.

    [5] http://lists.w3.org/Archives/Public/public-html/2007Aug/0186
    [5a] http://www.softpress.com/kb/questions/87/

    Authoring tools use the generator string as a kind of signal to
those who view source - to identify pages made by their product.
Softpress documents in their KnowledgeBase that they use it for
debugging purposes. [6] The HTMLwg has many times rejected to
reinterpret legacy HTML features in incompatible ways - this is why
<figcaption> and <summary> was created and <dt> and <dd> was rejected
as child-elements of <figure> and <details>. A reinterpretation of the
meta@name=generator string appears very similar to the predefined class
names that the HTML5 draft had in 2007. Like authors should be free to
use class names as they wish, the generator string should continue to
be in the generator vendors' free domain.

    [6] http://www.softpress.com/kb/questions/47/

    The generator exception drops all authoring requirements for @alt
text on the floor, even those which are easy machine checkable MUST
requirements in HTML5. For example the requirement that an img element
that is the sole content of a link MUST have alt text that is suitable
as link text. [7][8] (And argument could be made that the @alt of such
images should be checked as if they were link text rather than @alt
text.)  Authors and tool vendors must be taught to make choices -
accessibility and code quality is not brought forward by cultivating a
spirit which says "if you aren't absolutely 100% certain, then it is
better you don't put anything inside the @alt".


    Leif has proposed a change proposal [9] which consider all img
elements without @alt as invalidatable while at the same time helping
authors to fix them. That CP also makes no distinction between
generators and other authoring tools. We are not locked to his
unfinished CP, but we find it a much better starting point than the
generator exception.

[9] http://lists.w3.org/Archives/Public/public-html/2011Apr/0480

== On the Co-Chairs' decision on the presence of 
title making missing alt conforming ==

>* The presence of title makes missing alt conforming.

[ starting with the initial text from Rich]

Title has a completely different function from 
alt in HTML. Title is used to generate a tooltip, 
and is invisible when images are turned off. Alt 
does not generate a tooltip, and is visible when images are turned off.

If title is allowed as alternative text over alt 
it will break applications such as Yahoo! mail; 
it will also break a commonly-used feature, in 
less powerful mobile phones, where images are 
turned off to improve performance. If title were 
to be used in place of alt then when images are 
turned off in the browser, nothing meaningful will be shown in the browser.

Furthermore, having title take precedence over 
alt will result in tooltips being generated on 
decorative images and spacers, which would do 
tremendous harm to the user experience.

It should be noted that title is used as a last 
resort when other measures cannot be employed to 
compute the label or "name" of an object in the 
accessibility API mapping for browsers.

Please note the following demonstrations of 
failures resulting from the proposed approach:

[and now adding the following from Steve 
http://lists.w3.org/Archives/Public/public-html-a11y/2011Apr/0321.html ]

figure/figcaption provides the opportunity to 
convey a clear semantic differention between a 
caption and a text alternative. Use of the title 
attribute does not. title maps to the accessible 
name property (in cases where no other accessible 
label is provided) in accessibility APIs while 
<figcaption> can be mapped to a caption role in accessibility APIs.

accessibility API's have caption roles:
"ROLE_CAPTION The object contains descriptive 
information, usually textual, about another user 
interface element such as a table, chart, or image."

It has also been suggested that a caption role be 
added to ARIA next. 

The accessible support story for the title 
attribute has always been poor and there is no 
indication that this will change.
NO graphical browser provides device independent 
support for display of tooltips and the support 
has not improved over the last 6 years since I 
detailed issues with the title attribute in 2005 [4].
So far no browser vendor representatives have 
given a positive response to Steve's query [1] 
about whether this will change, but 2 stated there are no plans to [2].
Support for the display of title attribute 
content has decreased markedly over the last few 
years, none (to my knowledge) of the mobile or 
touch screen browsers developed provide access to 
it. I have recently published Information and 
guidance based on current known issues [3].

[2] http://lists.w3.org/Archives/Public/public-html/2011Apr/0490.html
[3] http://www.paciellogroup.com/blog/2010/11/using-the-html-title-attribute/

[Rich, Steve, does this text fully address any 
clarifications needed in response to Maciej's 
arguments in his discussion of title?
http://lists.w3.org/Archives/Public/public-html/2011Apr/0451.html ]

== On the Co-Chair's decision on the presence of figcaption ==

>* The presence of figcaption makes missing alt conform

[additional edits from Judy, including in response to comments from Benjamin]


Depending on the type of image and the type of 
publication, figure captions may be either 
concise or verbose. When information contained in 
figcaption is detailed and complicated, it is 
more similar to that supplied by the current 
longdesc attribute than to that supplied by 
alt.  alt, on the other hand, is normally brief, 
and identifies the image rather than fully 
describing it, especially when the image is 
complex.  Permitting figcaption to take the place 
of alt will in some situations result in more 
information being delivered to the user than the 
user needs or wants. The user should be able to 
access the information in figcaption, but not be 
deprived of the type of information they would normally receive through alt.

More specifically, figcaption may not fulfill the 
needs of assistive technology users, particularly 
blind or visually impaired users/screen-reader 
users. Figure captions describe images that users 
_can_ see. In contrast, alt and longdesc 
attributes identify and/or describe images that 
users _cannot_ see. The two audiences are 
different, and as such may require different 
approaches for image description. For instance, 
in scientific publications, information presented 
in figure captions will often state the 
scientific principles being illustrated, but not 
describe the illustration nor necessarily even 
identify the image since many authors assume that 
their audience can identify and discern 
information that is presented visually. To 
adequately support the needs of blind or visually 
impaired users may require description of 
visually discernable information that sighted 
users would object to as redundant if presented 
in a visible description via figcaption.


No screen reader in use today supports 
figcaption. Popular screen readers today do 
support the reading of the alt attribute and, in 
some cases, the longdesc attribute. Users are 
accustomed to locating image descriptions and 
identifiers via these two methods. If alt is 
omitted because a figcaption exists, current 
screen readers will not notify the user that a 
description exists in figcaption rather than alt. 
Updating screen readers is non-trivial not only 
for screen reader developers but especially for 
web users who may be on limited budgets, as these 
can be expensive assistive technologies, and some 
users may require the more comprehensive support 
and feature control in more expensive versions in 
order to maintain their livelihoods. Even where 
updated versions of screen readers are available 
in some languages at potentially low-cost or 
no-cost, the lag time for availability in other 
languages may be long. Also, even when updated 
versions of screen readers are available and 
affordable in local languages, newer versions may 
introduce compatability problems with 
screen-reader users' other applications. 
Therefore, failing to support established 
mechanisms for providing information about images 
via alternative text may break existing means of 
accessing accessibility information, and cause 
harm to people with disabilities including blind 
and visually impaired web users.

Delivering a brief description or identifier via 
the alt attribute is an established, successful 
mechanism in HTML. Making missing alt conforming 
in the presence of figcaption would, for many 
types of images, replace alternative text with a 
less appropriate substitute; and, for many screen 
reader users, be inaccessible to them because of 
lack of support in the assistive technology available to them.

Please let us know if additional clarifications 
are needed, and thanks in advance for your re-consideration.

[@@note, as per comment above, we will also incorporate change proposals]



Judy Brewer    +1.617.258.9741    http://www.w3.org/WAI
Director, Web Accessibility Initiative (WAI), World Wide Web Consortium (W3C)
MIT/CSAIL Building 32-G526
32 Vassar Street
Cambridge, MA,  02139,  USA

Received on Saturday, 30 April 2011 20:22:49 UTC