Re: Vowel Epenthesis and Audiograms from Al Gilman on 2005-03-08 (www-voice@w3.org from January to March 2005)

From: Al Gilman <Alfred.S.Gilman@IEEE.org>
Date: Tue, 8 Mar 2005 10:01:15 -0500
To: www-voice@w3.org
Cc: Harvey Bingham <hbingham@acm.org>
Message-Id: <p06110403be536a5a5a83@[10.0.1.2]>

*summary

a) the function (epending vowels for recognizability of the sound) is
[barring further knowledge] desirable from a WAI perspective.

b) the pronunciation lexicon seems a less likely place to standardize
terms to request this transform than, say, SSML voice properties.

*details

At 12:48 PM +0000 3/8/05, Max Froumentin wrote:
>Harvey Bingham <hbingham@acm.org> writes:
>
>>  As an aid to aging ears that have lost high-frequency hearing, I have
>>  found that vowel epenthesis can make pronunciation more understandable.
>
>Hi Harvey,
>
>Sounds interesting, could you describe a bit more how you'd see that
>added to the PLS? Extra markup?

Let's back up one level.  Where does it show up in use cases?

Vowel epenthesis as Harvey points out is a phoneme-string-level technique
that can contribute to a "high-contrast mode" for speech production.  So
it is potentially important in terms of making the Voice Browser robust in
the face of delivery context variability, whether because the line is noisy,
the end of the line is in a noisy environment, or the subscriber's hearing
is impaired.

In the Voice Browser Framework it probably belongs in the realm of Voice
Properties which are based in SSML.  But, because it is dependent on the
phone sequence specific to a token, it is a different class of 
speech-production
directive than pitch or rate.  It is more like voice-family, but it would want
to be available mix-and-match in combination with voice (family) selection.

In V3 are we allowed to open the voice properties of SSML up for
extension?  Maybe it belongs in there.  I don't see putting the epenthesized
versions in the lexicon any time soon given the 'documentation by exception'
performance budgets in current Voice Browser use of TTS.

This transform may also benefit ASR performance if a 
speech-variability processor
takes the standard pronunciation and generates a raft of likely 
variations given
the range of speakers expected.  People who are having difficulty 
hearing themselves,
and those with noisy audio for any reason, may adopt this shift reflexively and
allowing for it is likely to correct missed catches more than it 
introduces false
positives.  [But I'm guessing.]

Al
>
>Max.

Received on Tuesday, 8 March 2005 15:16:52 UTC