W3C home > Mailing lists > Public > w3c-wai-ua@w3.org > April to June 2000

Re: Speech pitch rate speedup

From: Jon Gunderson <jongund@ux1.cso.uiuc.edu>
Date: Fri, 14 Apr 2000 08:45:09 -0500
Message-Id: <4.1.20000414083636.00b30d40@staff.uiuc.edu>
To: Harvey Bingham <hbingham@ACM.org>
Cc: w3c-wai-ua@w3.org
I think the ideas are good, but I have a few concerns abouting adding these
requirements at this time:

1. The suggestions expand the scope of current checkpoints and so there are
new requirements on user agents.  We are already wrestling with a number of
potential changes to scope of the different checkpoints in the document.
These changes turn out to be significant we may need to take a step back in
the process.  To avoid taking a step back we want to minimize the number of
changes in scope and keep the document is moving foreward.

2. I don't think the group has the time or the resources at this time to
carefully understand the impact of the proposed changes.  We already have
enough on our plate and we need to stop somewhere.


At 01:54 AM 4/14/00 -0400, you wrote:
>For the record, I mentioned audio speedup at the face-to-face and now
>1. Add "narrated or " before "synthesized speech" in checkpoints 4.9,
>4.10, and 4.11.
>2. Allow speedup of speech in 4.5. [Ian feels this is change of scope,
>too big a change for this version.]
>Synthesized Speech
>Consider extending the three checkpoints there to include narrated speech
>when we are not trying to synchronize it with video.
>4.9 Allow the user to configure [narrated or ]synthesized speech playback 
>rate [Priority 1].
>One difference to the user between a naturally recorded (narrated) audio
>track and synthesized speech is that the former, as it includes more 
>articulation distinctions, may support larger playback rate change. On the
>other hand, synthesized speech has more liberty to alter where
>it stretches, where it shrinks, and how extremely it uses pitch shift to
>indicate punctuation.
>4.10 Allow the user to configure [narrated or ]synthesized speech volume.
>That certainly can be generalized to include natural speech.
>4.11 Allow the user to configure [narrated or ]synthesized speech pitch, 
>gender, and other articulation characteristics. [Priority 2]
>This too works, with a caution that pitch and gender are related. [Doubling
>the frequency of a male tenor makes a reasonable approximation of a
>female soprano.] Also, halving the frequency reduces the amount of 
>potential speedup, as it takes longer to express the intelligible parts of 
>phonemes. Increasing pitch adversely effects those with high frequency 
>hearing loss.
>In the next revision we expand the consideration of 4.5 slowing down audio 
>streams [P1] to also permit speeding them up, and to allow the user to
>change the rate while listening. This may be problematic when synchronizing 
>with video or animations. [It may be harder to decide what to chop/stretch
>for arbitrary rate changes.]
>4.5 Allow the user to slow the presentation rate of audio, video, and 
>animations. [Priority 1]
>I consider speedup needn't be Priority 1 as generally normal or slower 
>speeds are comprehensible, the speedup just is a time-saver while passing 
>linearly through what the eye would skim past.
>Checking with George Kerscher, he believes the five existing players he
>mentioned that are supporting digital talking books, allow the playback
>rate range to be adjusted from 80% to 300% (which he said was 600 words
>per minute). Below the lower rate, the words recorded at normal speech rate
>when "stretched" breakup and become unnatural.
>As I understand the rate change done without pitch shift method is to
>alter the dead time between words, and the vowel and some consonant sound 
>durations of phonemes.
>There are some minimal durations below which intelligibility is lost, 
>particularly for those with deteriorated high frequency hearing [as I
>am becoming more aware!]
>The plosive consonants b, hard c, d, f, hard g, j, hard k, p, q, t cannot 
>be altered much.
>Some of the other consonants may be duration altered: soft c, soft g,
>h, l, m, n, r, s, v, w, x, y, z.
>Regards/Harvey Bingham

Jon Gunderson, Ph.D., ATP
Coordinator of Assistive Communication and Information Technology
Chair, W3C WAI User Agent Working Group
Division of Rehabilitation - Education Services
College of Applied Life Studies
University of Illinois at Urbana/Champaign
1207 S. Oak Street, Champaign, IL  61820

Voice: (217) 244-5870
Fax: (217) 333-0248

E-mail: jongund@uiuc.edu

WWW: http://www.staff.uiuc.edu/~jongund
WWW: http://www.w3.org/wai/ua
Received on Friday, 14 April 2000 09:45:25 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:49:26 UTC