Speech pitch rate speedup

For the record, I mentioned audio speedup at the face-to-face and now
recommend:

1. Add "narrated or " before "synthesized speech" in checkpoints 4.9,
4.10, and 4.11.

2. Allow speedup of speech in 4.5. [Ian feels this is change of scope,
too big a change for this version.]


Synthesized Speech

Consider extending the three checkpoints there to include narrated speech
when we are not trying to synchronize it with video.

4.9 Allow the user to configure [narrated or ]synthesized speech playback 
rate [Priority 1].

One difference to the user between a naturally recorded (narrated) audio
track and synthesized speech is that the former, as it includes more 
articulation distinctions, may support larger playback rate change. On the
other hand, synthesized speech has more liberty to alter where
it stretches, where it shrinks, and how extremely it uses pitch shift to
indicate punctuation.

4.10 Allow the user to configure [narrated or ]synthesized speech volume.

That certainly can be generalized to include natural speech.

4.11 Allow the user to configure [narrated or ]synthesized speech pitch, 
gender, and other articulation characteristics. [Priority 2]

This too works, with a caution that pitch and gender are related. [Doubling
the frequency of a male tenor makes a reasonable approximation of a
female soprano.] Also, halving the frequency reduces the amount of 
potential speedup, as it takes longer to express the intelligible parts of 
phonemes. Increasing pitch adversely effects those with high frequency 
hearing loss.
--------
In the next revision we expand the consideration of 4.5 slowing down audio 
streams [P1] to also permit speeding them up, and to allow the user to
change the rate while listening. This may be problematic when synchronizing 
with video or animations. [It may be harder to decide what to chop/stretch
for arbitrary rate changes.]

4.5 Allow the user to slow the presentation rate of audio, video, and 
animations. [Priority 1]

I consider speedup needn't be Priority 1 as generally normal or slower 
speeds are comprehensible, the speedup just is a time-saver while passing 
linearly through what the eye would skim past.

Checking with George Kerscher, he believes the five existing players he
mentioned that are supporting digital talking books, allow the playback
rate range to be adjusted from 80% to 300% (which he said was 600 words
per minute). Below the lower rate, the words recorded at normal speech rate
when "stretched" breakup and become unnatural.

As I understand the rate change done without pitch shift method is to
alter the dead time between words, and the vowel and some consonant sound 
durations of phonemes.

There are some minimal durations below which intelligibility is lost, 
particularly for those with deteriorated high frequency hearing [as I
am becoming more aware!]

The plosive consonants b, hard c, d, f, hard g, j, hard k, p, q, t cannot 
be altered much.

Some of the other consonants may be duration altered: soft c, soft g,
h, l, m, n, r, s, v, w, x, y, z.


Regards/Harvey Bingham

Received on Friday, 14 April 2000 02:02:38 UTC