Pronunciation lexicon

The W3C Voice Browser Working Group aims to develop specifications to 
enable access to the Web using spoken interaction. They have published a 
requirements document, which is a set of requirements studies for voice 
browsers, and provides details of the requirements for markup used for 
specifying application specific pronunciation lexicons.

Application specific pronunciation lexicons are required in many 
situations where the default lexicon supplied with a speech recognition or 
speech synthesis processor does not cover the vocabulary of the 
application. A pronunciation lexicon is a collection of words or phrases 
together with their pronunciations specified using an appropriate 
pronunciation alphabet.

http://www.w3.org/TR/lexicon-reqs/

There are local pronunciation rules that Text-to-speech (TTS) synthesizers 
will never be able to get right all the time unless there is a lexicon for 
pronunciation.  For example, 'UT' can be pronounced as U.T., Utah, or ut.  
 If it is part of a U.S. mail address, UT is pronounced as the State of 
Utah, but if you're from Austin Texas, it is used as the abbreviation for 
the University of Texas, pronounced U.T..  Although the lexicon spec is 
being worked on by the W3C voice browser group, I know of no commitment 
from the screen reader vendors to support it even if the author marks it 
up in the HTML.

However, there is still a need for a best practices document for editors 
to use that would include guidelines about the best copy text for screen 
readers and TTS synthesizers.  For example, I rarely insert punctuation 
into the Alt attribute; I try to not leave lone periods following a URL 
because the trailing period gets pronounced as a 'dot' instead of being 
treated as punctuation; and there are others.

I think a best practices document would be a good project to work on for 
the WAI Education & Outreach Working group.

Regards,
Phill Jenkins
IBM Worldwide Accessibility Center
http://www.ibm.com/able

Received on Monday, 14 February 2005 15:56:47 UTC