RE: Vowel Epenthesis and Audiograms from Harvey Bingham on 2005-03-10 (www-voice@w3.org from January to March 2005)

From: Harvey Bingham <hbingham@acm.org>
Date: Thu, 10 Mar 2005 00:32:59 -0500
To: "Jim Tobias" <tobias@inclusive.com>, "'Al Gilman'" <Alfred.S.Gilman@IEEE.org>,<www-voice@w3.org>
Message-Id: <6.2.1.2.2.20050310000526.0280c980@pop.rcn.com>
At 10:28 AM 3/9/2005, Jim Tobias wrote:
>
>Hi all,
>
>I hate to stretch this thread out so far, but I'm requesting references to
>some interesting research, or a confirmation that the research has not been
>done:

The Handbook of Phonetic Science
Copyright &copy; 1997
ISBN 0-631-21478-X

has many good suggestions.
Chapter 26 is on Speech Synthesis, including text-to-speech.

>It strikes me that Harvey's idea may not be the only one possible for
>creating "hyperintelligible synthetic speech".  If we assume that most
>synthesis has had as its goal "naturalness", or an "audible Turing test",
>then there may be lots of uncharted territory regarding augmented
>intelligibility.  In short, are there ways of improving the intelligibility
>of synthetic speech above that of human speech by exaggerating certain
>speech characteristics (strengthening the weakest links), adding new
>marker-sounds, or by other techniques?
>
>There is a clear potential benefit for people who are hard of hearing or in
>noisy environments, but this may be even more valuable when the speech rate
>is set high, such as by screen reader users.
>
>I'm sorry if I've overexposed my ignorance and wasted your time....
>
>***********
>Jim Tobias
>Inclusive Technologies
>tobias@inclusive.com
>+732.441.0831 v/tty
>www.inclusive.com
>

Thanks, Al for your succinct summary of my epenthesis thoughts.

Best Regards/Harvey

> > -----Original Message-----
> > From: www-voice-request@w3.org
> > [mailto:www-voice-request@w3.org] On Behalf Of Al Gilman
> > Sent: Tuesday, March 08, 2005 10:01 AM
> > To: www-voice@w3.org
> > Cc: Harvey Bingham
> > Subject: Re: Vowel Epenthesis and Audiograms
> >
> >
> > *summary
> >
> > a) the function (epending vowels for recognizability of the
> > sound) is [barring further knowledge] desirable from a WAI
> > perspective.
> >
> > b) the pronunciation lexicon seems a less likely place to
> > standardize terms to request this transform than, say, SSML
> > voice properties.
> >
> > *details
> >
> > At 12:48 PM +0000 3/8/05, Max Froumentin wrote:
> > >Harvey Bingham <hbingham@acm.org> writes:
> > >
> > >>  As an aid to aging ears that have lost high-frequency hearing, I
> > >> have  found that vowel epenthesis can make pronunciation
> > more understandable.
> > >
> >
> > Let's back up one level.  Where does it show up in use cases?
> >
> > Vowel epenthesis as Harvey points out is a
> > phoneme-string-level technique that can contribute to a
> > "high-contrast mode" for speech production.  So it is
> > potentially important in terms of making the Voice Browser
> > robust in the face of delivery context variability, whether
> > because the line is noisy, the end of the line is in a noisy
> > environment, or the subscriber's hearing is impaired.
> >
> > In the Voice Browser Framework it probably belongs in the
> > realm of Voice Properties which are based in SSML.  But,
> > because it is dependent on the phone sequence specific to a
> > token, it is a different class of speech-production directive
> > than pitch or rate.  It is more like voice-family, but it
> > would want to be available mix-and-match in combination with
> > voice (family) selection.
> >
> > In V3 are we allowed to open the voice properties of SSML up
> > for extension?  Maybe it belongs in there.  I don't see
> > putting the epenthesized versions in the lexicon any time
> > soon given the 'documentation by exception'
> > performance budgets in current Voice Browser use of TTS.
> >
> > This transform may also benefit ASR performance if a
> > speech-variability processor takes the standard pronunciation
> > and generates a raft of likely variations given the range of
> > speakers expected.  People who are having difficulty hearing
> > themselves, and those with noisy audio for any reason, may
> > adopt this shift reflexively and allowing for it is likely to
> > correct missed catches more than it introduces false
> > positives.  [But I'm guessing.]
> >
> > Al
Received on Thursday, 10 March 2005 05:33:30 UTC