W3C home > Mailing lists > Public > www-voice@w3.org > January to March 2005

RE: Vowel Epenthesis and Audiograms

From: Jim Tobias <tobias@inclusive.com>
Date: Wed, 9 Mar 2005 10:28:55 -0500
Message-Id: <200503091529.j29FSx2Q025904@cedant4.abac.com>
To: "'Al Gilman'" <Alfred.S.Gilman@IEEE.org>, <www-voice@w3.org>
Cc: "'Harvey Bingham'" <hbingham@acm.org>

Hi all,

I hate to stretch this thread out so far, but I'm requesting references to
some interesting research, or a confirmation that the research has not been

It strikes me that Harvey's idea may not be the only one possible for
creating "hyperintelligible synthetic speech".  If we assume that most
synthesis has had as its goal "naturalness", or an "audible Turing test",
then there may be lots of uncharted territory regarding augmented
intelligibility.  In short, are there ways of improving the intelligibility
of synthetic speech above that of human speech by exaggerating certain
speech characteristics (strengthening the weakest links), adding new
marker-sounds, or by other techniques?

There is a clear potential benefit for people who are hard of hearing or in
noisy environments, but this may be even more valuable when the speech rate
is set high, such as by screen reader users.

I'm sorry if I've overexposed my ignorance and wasted your time....

Jim Tobias
Inclusive Technologies
+732.441.0831 v/tty

> -----Original Message-----
> From: www-voice-request@w3.org 
> [mailto:www-voice-request@w3.org] On Behalf Of Al Gilman
> Sent: Tuesday, March 08, 2005 10:01 AM
> To: www-voice@w3.org
> Cc: Harvey Bingham
> Subject: Re: Vowel Epenthesis and Audiograms
> *summary
> a) the function (epending vowels for recognizability of the 
> sound) is [barring further knowledge] desirable from a WAI 
> perspective.
> b) the pronunciation lexicon seems a less likely place to 
> standardize terms to request this transform than, say, SSML 
> voice properties.
> *details
> At 12:48 PM +0000 3/8/05, Max Froumentin wrote:
> >Harvey Bingham <hbingham@acm.org> writes:
> >
> >>  As an aid to aging ears that have lost high-frequency hearing, I 
> >> have  found that vowel epenthesis can make pronunciation 
> more understandable.
> >
> >Hi Harvey,
> >
> >Sounds interesting, could you describe a bit more how you'd see that 
> >added to the PLS? Extra markup?
> Let's back up one level.  Where does it show up in use cases?
> Vowel epenthesis as Harvey points out is a 
> phoneme-string-level technique that can contribute to a 
> "high-contrast mode" for speech production.  So it is 
> potentially important in terms of making the Voice Browser 
> robust in the face of delivery context variability, whether 
> because the line is noisy, the end of the line is in a noisy 
> environment, or the subscriber's hearing is impaired.
> In the Voice Browser Framework it probably belongs in the 
> realm of Voice Properties which are based in SSML.  But, 
> because it is dependent on the phone sequence specific to a 
> token, it is a different class of speech-production directive 
> than pitch or rate.  It is more like voice-family, but it 
> would want to be available mix-and-match in combination with 
> voice (family) selection.
> In V3 are we allowed to open the voice properties of SSML up 
> for extension?  Maybe it belongs in there.  I don't see 
> putting the epenthesized versions in the lexicon any time 
> soon given the 'documentation by exception'
> performance budgets in current Voice Browser use of TTS.
> This transform may also benefit ASR performance if a 
> speech-variability processor takes the standard pronunciation 
> and generates a raft of likely variations given the range of 
> speakers expected.  People who are having difficulty hearing 
> themselves, and those with noisy audio for any reason, may 
> adopt this shift reflexively and allowing for it is likely to 
> correct missed catches more than it introduces false 
> positives.  [But I'm guessing.]
> Al
> >
> >Max.
Received on Wednesday, 9 March 2005 15:30:16 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:03:50 UTC