- From: Harvey Bingham <hbingham@acm.org>
- Date: Fri, 14 Apr 2000 01:54:40 -0400
- To: w3c-wai-ua@w3.org
For the record, I mentioned audio speedup at the face-to-face and now recommend: 1. Add "narrated or " before "synthesized speech" in checkpoints 4.9, 4.10, and 4.11. 2. Allow speedup of speech in 4.5. [Ian feels this is change of scope, too big a change for this version.] Synthesized Speech Consider extending the three checkpoints there to include narrated speech when we are not trying to synchronize it with video. 4.9 Allow the user to configure [narrated or ]synthesized speech playback rate [Priority 1]. One difference to the user between a naturally recorded (narrated) audio track and synthesized speech is that the former, as it includes more articulation distinctions, may support larger playback rate change. On the other hand, synthesized speech has more liberty to alter where it stretches, where it shrinks, and how extremely it uses pitch shift to indicate punctuation. 4.10 Allow the user to configure [narrated or ]synthesized speech volume. That certainly can be generalized to include natural speech. 4.11 Allow the user to configure [narrated or ]synthesized speech pitch, gender, and other articulation characteristics. [Priority 2] This too works, with a caution that pitch and gender are related. [Doubling the frequency of a male tenor makes a reasonable approximation of a female soprano.] Also, halving the frequency reduces the amount of potential speedup, as it takes longer to express the intelligible parts of phonemes. Increasing pitch adversely effects those with high frequency hearing loss. -------- In the next revision we expand the consideration of 4.5 slowing down audio streams [P1] to also permit speeding them up, and to allow the user to change the rate while listening. This may be problematic when synchronizing with video or animations. [It may be harder to decide what to chop/stretch for arbitrary rate changes.] 4.5 Allow the user to slow the presentation rate of audio, video, and animations. [Priority 1] I consider speedup needn't be Priority 1 as generally normal or slower speeds are comprehensible, the speedup just is a time-saver while passing linearly through what the eye would skim past. Checking with George Kerscher, he believes the five existing players he mentioned that are supporting digital talking books, allow the playback rate range to be adjusted from 80% to 300% (which he said was 600 words per minute). Below the lower rate, the words recorded at normal speech rate when "stretched" breakup and become unnatural. As I understand the rate change done without pitch shift method is to alter the dead time between words, and the vowel and some consonant sound durations of phonemes. There are some minimal durations below which intelligibility is lost, particularly for those with deteriorated high frequency hearing [as I am becoming more aware!] The plosive consonants b, hard c, d, f, hard g, j, hard k, p, q, t cannot be altered much. Some of the other consonants may be duration altered: soft c, soft g, h, l, m, n, r, s, v, w, x, y, z. Regards/Harvey Bingham
Received on Friday, 14 April 2000 02:02:38 UTC