- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Wed, 2 May 2001 00:00:50 +0100 (BST)
- To: w3c-wai-ig@w3.org
Thinking about a recent article, I began to wonder whether there ought to be a mechanism to allow an actual voice recording to be used instead of text to speech conversion to provide audio equivalents for a web page. There is already an alt attribute for providing alternative text for purely visual elements, so I wondered whether there ought to be an altsound attribute to provide an audio equivalent for visual and textual elements. Because audio and visual channels can be used in parallel, this could be used to augment a basically visual model, which might provide benefits for non-literate users. This would require that browsers coupled some visual pointing convention with the screen reading function. There are technical problems in providing digitised speech, basically to do with the large bandwidth required, however, used in an accessible way, pure visual access or text to speech could be selected as alternatives by the user. Of course, such a mechanism may well be more used for non-accessibility reasons, and such abuse could well cause a drift in browser capability away from accessibility objectives. On the other hand, without the mass market mis-applications, there may be no incentive to implement the features. The other technical issue is integration with visual pointing and providing cue and review capabilities. Given the precedent of alt and longdesc (not that there is much precedent in the latter) I think this ought to be an HTML, rather than style sheet, attribute (whereas background music really ought to be in style sheets, and this mechanism would provide a way to violate that principle). One could argue that longdesc could be overloaded with this function, but I think the semantics are somewhat different: longdesc is an on demand function. To do this overloading, the browser would have to send an Accept header with a higher priority for audio than for HTML. Actually, if longdesc is implemented sensibly, one could already respond with audio (noting that browsers generally don't allow users to tune content negotiating quality values). The problem here is that very few web authors are aware of content negotiation (it requires configuring something other than HTML) and many people on small budgets (charities, clubs, etc.) don't feel they can afford proper access to a web server (remember the recent Hull site that appeared to use a cheap service). CSS2 has style attributes for providing backgrounds and pre- and post- sounds for an element. The pre- and post- sounds imply linear progression, but: body {play-during: url(.....);} would seem to make sense as a standards conforming alternative to bgsound, even though the attributes are described in terms of only being used for aural browser, and I doubt that any current graphical browser implements them (I can't get a string match in the Mozilla 0.8.1 sources). I suspect most mainstream browser designers skipped the aural style sheets section! By coupling visual pointing to sound context, as suggested above, this attribute could be applied on other elements. Note that I don't think this is a job for author control (except for proof of concept); the pointing convention should be standard for the browser, not learned for each page - in particular mouseover triggering should be under user or browser control, not implemented by the author. With such local use, such attributes could be used as a non-verbal signal to back up text, but good accessiblity would still require text or recorded speech, and I think that that should be identified in the main document, not the style sheet. I seem to remember that phonetic spellings are included in later style sheet standards, and that might constitute a counter argument to the argument that the spoken sound should be referenced from the main document.
Received on Tuesday, 1 May 2001 20:14:03 UTC