- From: Jim Allan <allanj@tsbvi.edu>
- Date: Thu, 19 Nov 2009 11:05:19 -0600
- To: "'Markku Hakkinen'" <markku.hakkinen@gmail.com>, <kim@redstartsystems.com>
- Cc: "'UAWG list'" <w3c-wai-ua@w3.org>
http://scholar.google.com/scholar?hl=en&q=playback%20rate%20comprehension&so urceid=navclient-ff&rlz=1B3GGGL_enUS261US261&um=1&ie=UTF-8&sa=N&tab=ws 50%-200% seems reasonable. Tho much below 75% my comprehension suffers. Slowing also suffers from pitch change. Are there pitch algorithms for slowing down? > -----Original Message----- > From: w3c-wai-ua-request@w3.org [mailto:w3c-wai-ua-request@w3.org] On > Behalf Of Markku Hakkinen > Sent: Thursday, November 19, 2009 10:03 AM > To: kim@redstartsystems.com > Cc: UAWG list > Subject: Re: ACTION-238 Revise guidelines 4.9.6 > > The technical downside is that at higher rates the quality will most > likely be unintelligible to most listeners. Assuming a typical > (English) reading rate of approximately 140 words per minute, 400% is > 560 words per minute. Speech synthesizer users are reported to at > times use reading rates as high as 600 words per minute, but the > intelligibility depends much on the synthesizer technology, and at > that point it is really speech skimming in any case. Sure, experienced > users may push the technology, but is it right to set a high top end > that creates an expectation that the rate will be intelligible or > useful to more than just experts? And, from my experience, it is > easier to push synthesizers to higher rates than prerecorded speech. > I don't have (or recall) data that says we can push prerecorded speech > to 400% with any decent quality. > > With the pitch maintenance approach, the key point is to allow the > user to select intelligible speech playback rates, and for that > reason, that's why I think we see the 33%-300% range, which is what > most algorithms appear to handle in my experience. Outside of that > range, we can expect breakdown in terms of intelligibility. > > The use cases for speech slowdown are individuals with cognitive or > learning disabilities, older adults, and also new language learners. > Degradation in speech intelligibility at low rates would probably > outweigh the benefits of very slow speech for these groups. Can > speech be slowed to 25% and still be intelligible? Depends on the > encoding of the source audio and the quality of the TSM algorithm. > Again, I just don't see the data which tells me that 25% (or even 33%) > is an empirically based number. 50% may be adequate. I'm doing some > further digging on this. > > Perhaps we need to have 50% as A and 33% as AA. > > One more point. As an international standard, I haven't looked at > data that tells me how effective TSM algorithms are across languages. > Do these presentation rates hold true across languages? I'm look into > this, also. > > If anyone has pointers to specific research findings, please share. > > br, > mark > > On Wed, Nov 18, 2009 at 5:16 PM, Kim Patch <kim@redstartsystems.com> > wrote: > > A couple of things to maybe think about. Is there a technical > downside to > > wider versus narrower percentages, for instance 33-300% versus 25- > 400%? > > > > Second, if there's no downside, should guidelines like these be wide > enough > > to make sure to cover any possible situations, maybe even with a > discernible > > margin because sometimes when people have a tool that goes further > they use > > it? It's not uncommon for people who create technology to be > surprised at > > how far experienced users can push it when the technology allows them > to. > > The speed of some folks who use a single switch to control the > computer > > using scanning software comes to mind. > > > > Cheers, > > Kim > > > > Markku Hakkinen wrote: > > > > Hi Jan, > > > > Yes, they do need to be normalized (and then resolve precision, > 33.3%, > > if going with percentages). > > > > I put this out as a first pass at redrafting 4.9.6 to address both > > media slow down and speed up. Discussion needed. > > > > What I am lacking is empirical data to back the upper and lower > limits > > for both speech and visual, and I am a hesitant to cast these numbers > > in stone (even when drawn from other standards, e.g., talking books) > > without being able to point to specific data saying why these numbers > > are optimal (de facto recommendations?). > > > > There is empirical data suggesting that speed up and slow down have > > benefit, but what I don't have is data saying what rates are ideal. > > > > I'll add that the software algorithms for time scale modification > used > > by both Windows Media Player and Quicktime player currently support > > the desired range (though the quality of the original encoding may > > affect perceived quality at the high and low ends of the range). > The > > question remains, what should that range really be? > > > > br, > > mark > > > > On Wed, Nov 18, 2009 at 2:22 PM, Jan Richards > <jan.richards@utoronto.ca> > > wrote: > > > > > > Hi Mark, > > > > I think we could be more consistent in the way these are stated. In > one case > > we say "1/3 to 3 times" implying 33%-300% and in another we say at > least one > > setting between 40% and 60% which implies a lowest setting of 60% > would be > > ok. > > > > Cheers, > > Jan > > > > > > Markku Hakkinen wrote: > > > > > > 4.9.6 Playback Rate Adjustment for Multimedia Content. > > > > The user can adjust playback rate of prerecorded content containing > > speech audio tracks such that all of the following are true (Level > A): > > > > - The playback rate should be user adjustable between 1/3 and 3 > times > > real time of the recorded content. > > > > - Recorded speech, whose playback rate has been adjusted by the > user, > > should utilize pitch maintenance in order to avoid degradation of the > > speech quality. > > > > If only a visual track is present, provide at least one setting > > between 40% and 60% of the original speed. (Level A) > > > > When audio and video tracks are expected to be synchronized, > > synchronization is maintained as long as they are played at 75% of > the > > original speed or higher. (Level A) > > > > The UA should provide a function that resets the playback rate to > > normal (1x) . (Level A) > > > > > > > > -- > > Jan Richards, M.Sc. > > User Interface Design Lead > > Adaptive Technology Resource Centre (ATRC) > > Faculty of Information > > University of Toronto > > > > Email: jan.richards@utoronto.ca > > Web: http://jan.atrc.utoronto.ca > > Phone: 416-946-7060 > > Fax: 416-971-2896 > > > > > > > > > > > > ________________________________ > > > > No virus found in this incoming message. > > Checked by AVG - www.avg.com > > Version: 8.5.425 / Virus Database: 270.14.72/2511 - Release Date: > 11/18/09 > > 07:50:00 > > > > > > > > -- > > ___________________________________________________ > > > > Kimberly Patch > > President > > Redstart Systems, Inc., makers of Utter Command > > (617) 325-3966 > > kim@redstartsystems.com > > > > www.redstartsystems.com > > - making speech fly > > > > Patch on Speech blog > > Redstart Systems on Twitter > > ___________________________________________________
Received on Thursday, 19 November 2009 17:06:05 UTC