W3C home > Mailing lists > Public > www-voice@w3.org > January to March 2005

RE: Vowel Epenthesis and Audiograms

From: Al Gilman <Alfred.S.Gilman@IEEE.org>
Date: Wed, 9 Mar 2005 15:29:27 -0500
Message-Id: <p06110404be550b077fee@[]>
To: "Jim Tobias" <tobias@inclusive.com>, <www-voice@w3.org>
Cc: "'Harvey Bingham'" <hbingham@acm.org>

At 10:28 AM -0500 3/9/05, Jim Tobias wrote:
>Hi all,
>I hate to stretch this thread out so far, but I'm requesting references to
>some interesting research, or a confirmation that the research has not been

Step 1: coin a Google search that lists Harvey's page at the top of the list.

>It strikes me that Harvey's idea may not be the only one possible for
>creating "hyperintelligible synthetic speech".  If we assume that most
>synthesis has had as its goal "naturalness", or an "audible Turing test",
>then there may be lots of uncharted territory regarding augmented
>intelligibility.  In short, are there ways of improving the intelligibility
>of synthetic speech above that of human speech by exaggerating certain
>speech characteristics (strengthening the weakest links), adding new
>marker-sounds, or by other techniques?

This is into the domain of HCI and AT research.  Such as pursued by the
RERC on Hearing Enhancement.

The point of raising the vowel epenthesis technique *here* is that
with phonemic representations of the sounds of words, the PLS is
tantalizingly close to techniques that would perturb the phonology to
achieve enhanced intelligibility, without getting in to a
full-court-press on all aspects of speech production. Consider it
potentially-low-hanging fruit.

>There is a clear potential benefit for people who are hard of hearing or in
>noisy environments, but this may be even more valuable when the speech rate
>is set high, such as by screen reader users.
>I'm sorry if I've overexposed my ignorance and wasted your time....

Not at all.  The human-function-enhancing research is always interesting,
but not to first order on the agenda of the standardization process in,
for example, the Voice Browser WG.


>Jim Tobias
>Inclusive Technologies
>+732.441.0831 v/tty
>>  -----Original Message-----
>>  From: www-voice-request@w3.org
>>  [mailto:www-voice-request@w3.org] On Behalf Of Al Gilman
>>  Sent: Tuesday, March 08, 2005 10:01 AM
>>  To: www-voice@w3.org
>>  Cc: Harvey Bingham
>>  Subject: Re: Vowel Epenthesis and Audiograms
>>  *summary
>>  a) the function (epending vowels for recognizability of the
>>  sound) is [barring further knowledge] desirable from a WAI
>>  perspective.
>>  b) the pronunciation lexicon seems a less likely place to
>>  standardize terms to request this transform than, say, SSML
>>  voice properties.
>>  *details
>>  At 12:48 PM +0000 3/8/05, Max Froumentin wrote:
>>  >Harvey Bingham <hbingham@acm.org> writes:
>>  >
>>  >>  As an aid to aging ears that have lost high-frequency hearing, I
>>  >> have  found that vowel epenthesis can make pronunciation
>>  more understandable.
>>  >
>>  >Hi Harvey,
>>  >
>>  >Sounds interesting, could you describe a bit more how you'd see that
>>  >added to the PLS? Extra markup?
>>  Let's back up one level.  Where does it show up in use cases?
>>  Vowel epenthesis as Harvey points out is a
>>  phoneme-string-level technique that can contribute to a
>>  "high-contrast mode" for speech production.  So it is
>>  potentially important in terms of making the Voice Browser
>>  robust in the face of delivery context variability, whether
>>  because the line is noisy, the end of the line is in a noisy
>>  environment, or the subscriber's hearing is impaired.
>>  In the Voice Browser Framework it probably belongs in the
>>  realm of Voice Properties which are based in SSML.  But,
>>  because it is dependent on the phone sequence specific to a
>>  token, it is a different class of speech-production directive
>>  than pitch or rate.  It is more like voice-family, but it
>>  would want to be available mix-and-match in combination with
>>  voice (family) selection.
>>  In V3 are we allowed to open the voice properties of SSML up
>  > for extension?  Maybe it belongs in there.  I don't see
>>  putting the epenthesized versions in the lexicon any time
>>  soon given the 'documentation by exception'
>>  performance budgets in current Voice Browser use of TTS.
>>  This transform may also benefit ASR performance if a
>>  speech-variability processor takes the standard pronunciation
>>  and generates a raft of likely variations given the range of
>>  speakers expected.  People who are having difficulty hearing
>>  themselves, and those with noisy audio for any reason, may
>>  adopt this shift reflexively and allowing for it is likely to
>>  correct missed catches more than it introduces false
>>  positives.  [But I'm guessing.]
>>  Al
>>  >
>>  >Max.
Received on Wednesday, 9 March 2005 22:27:27 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:03:50 UTC