- From: Al Gilman <Alfred.S.Gilman@IEEE.org>
- Date: Wed, 9 Mar 2005 15:29:27 -0500
- To: "Jim Tobias" <tobias@inclusive.com>, <www-voice@w3.org>
- Cc: "'Harvey Bingham'" <hbingham@acm.org>
At 10:28 AM -0500 3/9/05, Jim Tobias wrote: >Hi all, > >I hate to stretch this thread out so far, but I'm requesting references to >some interesting research, or a confirmation that the research has not been >done: Step 1: coin a Google search that lists Harvey's page at the top of the list. Done: http://www.google.com/search?q=articulation+intelligibility+hearing+epenthesis+synthetic+speech >It strikes me that Harvey's idea may not be the only one possible for >creating "hyperintelligible synthetic speech". If we assume that most >synthesis has had as its goal "naturalness", or an "audible Turing test", >then there may be lots of uncharted territory regarding augmented >intelligibility. In short, are there ways of improving the intelligibility >of synthetic speech above that of human speech by exaggerating certain >speech characteristics (strengthening the weakest links), adding new >marker-sounds, or by other techniques? This is into the domain of HCI and AT research. Such as pursued by the RERC on Hearing Enhancement. http://www.hearingresearch.org/Newsletter/RERC_moves.htm The point of raising the vowel epenthesis technique *here* is that with phonemic representations of the sounds of words, the PLS is tantalizingly close to techniques that would perturb the phonology to achieve enhanced intelligibility, without getting in to a full-court-press on all aspects of speech production. Consider it potentially-low-hanging fruit. >There is a clear potential benefit for people who are hard of hearing or in >noisy environments, but this may be even more valuable when the speech rate >is set high, such as by screen reader users. > >I'm sorry if I've overexposed my ignorance and wasted your time.... Not at all. The human-function-enhancing research is always interesting, but not to first order on the agenda of the standardization process in, for example, the Voice Browser WG. Al >*********** >Jim Tobias >Inclusive Technologies >tobias@inclusive.com >+732.441.0831 v/tty >www.inclusive.com > > >> -----Original Message----- >> From: www-voice-request@w3.org >> [mailto:www-voice-request@w3.org] On Behalf Of Al Gilman >> Sent: Tuesday, March 08, 2005 10:01 AM >> To: www-voice@w3.org >> Cc: Harvey Bingham >> Subject: Re: Vowel Epenthesis and Audiograms >> >> >> *summary >> >> a) the function (epending vowels for recognizability of the >> sound) is [barring further knowledge] desirable from a WAI >> perspective. >> >> b) the pronunciation lexicon seems a less likely place to >> standardize terms to request this transform than, say, SSML >> voice properties. >> >> *details >> >> At 12:48 PM +0000 3/8/05, Max Froumentin wrote: >> >Harvey Bingham <hbingham@acm.org> writes: >> > >> >> As an aid to aging ears that have lost high-frequency hearing, I >> >> have found that vowel epenthesis can make pronunciation >> more understandable. >> > >> >Hi Harvey, >> > >> >Sounds interesting, could you describe a bit more how you'd see that >> >added to the PLS? Extra markup? >> >> Let's back up one level. Where does it show up in use cases? >> >> Vowel epenthesis as Harvey points out is a >> phoneme-string-level technique that can contribute to a >> "high-contrast mode" for speech production. So it is >> potentially important in terms of making the Voice Browser >> robust in the face of delivery context variability, whether >> because the line is noisy, the end of the line is in a noisy >> environment, or the subscriber's hearing is impaired. >> >> In the Voice Browser Framework it probably belongs in the >> realm of Voice Properties which are based in SSML. But, >> because it is dependent on the phone sequence specific to a >> token, it is a different class of speech-production directive >> than pitch or rate. It is more like voice-family, but it >> would want to be available mix-and-match in combination with >> voice (family) selection. >> >> In V3 are we allowed to open the voice properties of SSML up > > for extension? Maybe it belongs in there. I don't see >> putting the epenthesized versions in the lexicon any time >> soon given the 'documentation by exception' >> performance budgets in current Voice Browser use of TTS. >> >> This transform may also benefit ASR performance if a >> speech-variability processor takes the standard pronunciation >> and generates a raft of likely variations given the range of >> speakers expected. People who are having difficulty hearing >> themselves, and those with noisy audio for any reason, may >> adopt this shift reflexively and allowing for it is likely to >> correct missed catches more than it introduces false >> positives. [But I'm guessing.] >> >> Al >> > >> >Max. >> >> >>
Received on Wednesday, 9 March 2005 22:27:27 UTC