- From: Harvey Bingham <hbingham@acm.org>
- Date: Thu, 10 Mar 2005 00:32:59 -0500
- To: "Jim Tobias" <tobias@inclusive.com>, "'Al Gilman'" <Alfred.S.Gilman@IEEE.org>,<www-voice@w3.org>
At 10:28 AM 3/9/2005, Jim Tobias wrote: > >Hi all, > >I hate to stretch this thread out so far, but I'm requesting references to >some interesting research, or a confirmation that the research has not been >done: The Handbook of Phonetic Science Copyright © 1997 ISBN 0-631-21478-X has many good suggestions. Chapter 26 is on Speech Synthesis, including text-to-speech. >It strikes me that Harvey's idea may not be the only one possible for >creating "hyperintelligible synthetic speech". If we assume that most >synthesis has had as its goal "naturalness", or an "audible Turing test", >then there may be lots of uncharted territory regarding augmented >intelligibility. In short, are there ways of improving the intelligibility >of synthetic speech above that of human speech by exaggerating certain >speech characteristics (strengthening the weakest links), adding new >marker-sounds, or by other techniques? > >There is a clear potential benefit for people who are hard of hearing or in >noisy environments, but this may be even more valuable when the speech rate >is set high, such as by screen reader users. > >I'm sorry if I've overexposed my ignorance and wasted your time.... > >*********** >Jim Tobias >Inclusive Technologies >tobias@inclusive.com >+732.441.0831 v/tty >www.inclusive.com > Thanks, Al for your succinct summary of my epenthesis thoughts. Best Regards/Harvey > > -----Original Message----- > > From: www-voice-request@w3.org > > [mailto:www-voice-request@w3.org] On Behalf Of Al Gilman > > Sent: Tuesday, March 08, 2005 10:01 AM > > To: www-voice@w3.org > > Cc: Harvey Bingham > > Subject: Re: Vowel Epenthesis and Audiograms > > > > > > *summary > > > > a) the function (epending vowels for recognizability of the > > sound) is [barring further knowledge] desirable from a WAI > > perspective. > > > > b) the pronunciation lexicon seems a less likely place to > > standardize terms to request this transform than, say, SSML > > voice properties. > > > > *details > > > > At 12:48 PM +0000 3/8/05, Max Froumentin wrote: > > >Harvey Bingham <hbingham@acm.org> writes: > > > > > >> As an aid to aging ears that have lost high-frequency hearing, I > > >> have found that vowel epenthesis can make pronunciation > > more understandable. > > > > > > > Let's back up one level. Where does it show up in use cases? > > > > Vowel epenthesis as Harvey points out is a > > phoneme-string-level technique that can contribute to a > > "high-contrast mode" for speech production. So it is > > potentially important in terms of making the Voice Browser > > robust in the face of delivery context variability, whether > > because the line is noisy, the end of the line is in a noisy > > environment, or the subscriber's hearing is impaired. > > > > In the Voice Browser Framework it probably belongs in the > > realm of Voice Properties which are based in SSML. But, > > because it is dependent on the phone sequence specific to a > > token, it is a different class of speech-production directive > > than pitch or rate. It is more like voice-family, but it > > would want to be available mix-and-match in combination with > > voice (family) selection. > > > > In V3 are we allowed to open the voice properties of SSML up > > for extension? Maybe it belongs in there. I don't see > > putting the epenthesized versions in the lexicon any time > > soon given the 'documentation by exception' > > performance budgets in current Voice Browser use of TTS. > > > > This transform may also benefit ASR performance if a > > speech-variability processor takes the standard pronunciation > > and generates a raft of likely variations given the range of > > speakers expected. People who are having difficulty hearing > > themselves, and those with noisy audio for any reason, may > > adopt this shift reflexively and allowing for it is likely to > > correct missed catches more than it introduces false > > positives. [But I'm guessing.] > > > > Al
Received on Thursday, 10 March 2005 05:33:30 UTC