Re: [html4all] the alt attribute debate from Henri Sivonen on 2007-09-24 (www-archive@w3.org from September 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 24 Sep 2007 18:04:07 +0300
To: Steven Faulkner <faulkner.steve@gmail.com>
Cc: "advocate group" <list@html4all.org>, "John Foliot - WATS. ca" <foliot@wats.ca>, "Anne van Kesteren" <annevk@opera.com>, www-archive@w3.org
Message-Id: <32CB4D41-C3F6-493E-AF26-009950FF8283@iki.fi>
On Sep 24, 2007, at 15:54, Steven Faulkner wrote:

> As have written in response to you on a number of occasions it is  
> not simply JAWS that reads out the a filename (if the image is the  
> sole content of a link) in response to no alt it is also the other  
> major screen reader window eyes and for all i know quite a few others.

Yes, I have noticed that.

The point is that no one has so far explained why that behavior  
(which is awful in some situations as you have demonstrated) would be  
the permanent end state of development for AT and not just a  
transient usability bug that is fixable in a handful of  
implementations. For example, VoiceOver plus Safari on Mac OS X 10.4  
do not share this usability bug, which trivially demonstrates that it  
is possible to construct AT that doesn't have the usability bug.

On the other hand, there being images without human-authored alt text  
seems to be permanent badness instead of a transient problem. It is a  
problem that has been there from the beginning of the Web and shows  
no signs of getting solved. Moreover, it is easy to contemplate new  
situations where some satisfaction (for sighted people) can be  
derived from a workflow (keyboardless camera that uploads to Web  
immediately) that leaves no realistic opportunity for writing  
alternative text. (When there's some satisfaction to be had for a  
notable group of people, people will want to do it, which is why I  
wouldn't expect people to refrain from such workflows for  
accessibility considerations.)

> and these AT are following UAAG: http://www.w3.org/TR/UAAG10-TECHS/ 
> guidelines.html#tech-missing-alt

I'm aware of that guideline. I also posit that the guideline taken as- 
is is bad, because there are a lot of real file names out there that  
lead to awful usability when read out load. At minimum, AT should  
check that the filename to be read has some minimal traits of  
readability in the language of the speech synthetizer.

(We shouldn't take stuff under /TR/ as holy writ set in stone when  
following it clearly leads to worse usability than doing something  
smarter.)

> This premise has not been sufficiently tested (actually to my  
> knwoledge, no testing that has been published has been done by  
> proponents of the alt change, so this premise is based on  
> supposition),

That's because it seems so clear to a software developer like me that  
AT software could relatively easily be written to surpass bogus alt  
text in quality. That's why I'd be interested in hearing why AT  
couldn't be improved as suggested below.

(Relative ease above is relative to the ease of making content  
providers abandon the kind of workflows where human-written alt text  
is not one of the products of the workflow.)

> but as my initial testing results revealed much of what you call  
> bogus alt text on images can actually provide some useful  
> information about the image.

Examples of what *I* call bogus alt text are:
""
"image"
"photo"
"DSC5413.jpg"
"IMG8329"

"" has the problem of colliding with the way of indicating that the  
presence of the image should be suppressed from non-visual rendering  
entirely.

"image" and "photo" as part of content are worse than "image" or  
"photo" as AT-generated speech, because they take
away the opportunities for AT to improve and innovate on how to  
effectively communicate the presence of an image without proper  
alternative text. Examples of what AT could do above the baseline are  
saying "image" or "photo" in the chrome voice instead of the content  
voice (when the AT has different voices for chrome and content) and  
using a audio que that plays faster than those words in the content  
voice.

"DSC5413.jpg" and "IMG8329" are worse than e.g. "image" or "photo"  
said in the chrome voice, because they take a relatively long time to  
read, tell nothing about the image except that it probably came from  
a digital camera and *if* you are able to memorize the numbers as you  
go (a big "if"), you may be able to distinguish images from each  
other as you navigate around.

What I'm talking about isn't rocket science as doesn't involve AI- 
complete UAs describing images to people.

> What needs to be done is more research and hey lets do something  
> novel, lets go and ask the users what they find useful.

That's a good idea. However, I get a feeling that users may not have  
a good grasp of what kind of ideas would be implementable and,  
therefore, eligible to be suggested.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 24 September 2007 15:04:37 UTC