- From: Markku Hakkinen <markku.hakkinen@gmail.com>
- Date: Thu, 19 Nov 2009 12:51:09 -0500
- To: allanj@tsbvi.edu
- Cc: kim@redstartsystems.com, UAWG list <w3c-wai-ua@w3.org>
The better TSM algorithms handle slowing quite well, but it depends again upon the quality of the source input. Most TSM algorithms I have seen are designed to handle the full range of slow to fast. My guess is that for Web-delivered audio, the level of compression will play a negative role in the quality of slow speech. mark On Thu, Nov 19, 2009 at 12:05 PM, Jim Allan <allanj@tsbvi.edu> wrote: > http://scholar.google.com/scholar?hl=en&q=playback%20rate%20comprehension&so > urceid=navclient-ff&rlz=1B3GGGL_enUS261US261&um=1&ie=UTF-8&sa=N&tab=ws > > 50%-200% seems reasonable. Tho much below 75% my comprehension suffers. > Slowing also suffers from pitch change. Are there pitch algorithms for > slowing down? > > >> -----Original Message----- >> From: w3c-wai-ua-request@w3.org [mailto:w3c-wai-ua-request@w3.org] On >> Behalf Of Markku Hakkinen >> Sent: Thursday, November 19, 2009 10:03 AM >> To: kim@redstartsystems.com >> Cc: UAWG list >> Subject: Re: ACTION-238 Revise guidelines 4.9.6 >> >> The technical downside is that at higher rates the quality will most >> likely be unintelligible to most listeners. Assuming a typical >> (English) reading rate of approximately 140 words per minute, 400% is >> 560 words per minute. Speech synthesizer users are reported to at >> times use reading rates as high as 600 words per minute, but the >> intelligibility depends much on the synthesizer technology, and at >> that point it is really speech skimming in any case. Sure, experienced >> users may push the technology, but is it right to set a high top end >> that creates an expectation that the rate will be intelligible or >> useful to more than just experts? And, from my experience, it is >> easier to push synthesizers to higher rates than prerecorded speech. >> I don't have (or recall) data that says we can push prerecorded speech >> to 400% with any decent quality. >> >> With the pitch maintenance approach, the key point is to allow the >> user to select intelligible speech playback rates, and for that >> reason, that's why I think we see the 33%-300% range, which is what >> most algorithms appear to handle in my experience. Outside of that >> range, we can expect breakdown in terms of intelligibility. >> >> The use cases for speech slowdown are individuals with cognitive or >> learning disabilities, older adults, and also new language learners. >> Degradation in speech intelligibility at low rates would probably >> outweigh the benefits of very slow speech for these groups. Can >> speech be slowed to 25% and still be intelligible? Depends on the >> encoding of the source audio and the quality of the TSM algorithm. >> Again, I just don't see the data which tells me that 25% (or even 33%) >> is an empirically based number. 50% may be adequate. I'm doing some >> further digging on this. >> >> Perhaps we need to have 50% as A and 33% as AA. >> >> One more point. As an international standard, I haven't looked at >> data that tells me how effective TSM algorithms are across languages. >> Do these presentation rates hold true across languages? I'm look into >> this, also. >> >> If anyone has pointers to specific research findings, please share. >> >> br, >> mark >> >> On Wed, Nov 18, 2009 at 5:16 PM, Kim Patch <kim@redstartsystems.com> >> wrote: >> > A couple of things to maybe think about. Is there a technical >> downside to >> > wider versus narrower percentages, for instance 33-300% versus 25- >> 400%? >> > >> > Second, if there's no downside, should guidelines like these be wide >> enough >> > to make sure to cover any possible situations, maybe even with a >> discernible >> > margin because sometimes when people have a tool that goes further >> they use >> > it? It's not uncommon for people who create technology to be >> surprised at >> > how far experienced users can push it when the technology allows them >> to. >> > The speed of some folks who use a single switch to control the >> computer >> > using scanning software comes to mind. >> > >> > Cheers, >> > Kim >> > >> > Markku Hakkinen wrote: >> > >> > Hi Jan, >> > >> > Yes, they do need to be normalized (and then resolve precision, >> 33.3%, >> > if going with percentages). >> > >> > I put this out as a first pass at redrafting 4.9.6 to address both >> > media slow down and speed up. Discussion needed. >> > >> > What I am lacking is empirical data to back the upper and lower >> limits >> > for both speech and visual, and I am a hesitant to cast these numbers >> > in stone (even when drawn from other standards, e.g., talking books) >> > without being able to point to specific data saying why these numbers >> > are optimal (de facto recommendations?). >> > >> > There is empirical data suggesting that speed up and slow down have >> > benefit, but what I don't have is data saying what rates are ideal. >> > >> > I'll add that the software algorithms for time scale modification >> used >> > by both Windows Media Player and Quicktime player currently support >> > the desired range (though the quality of the original encoding may >> > affect perceived quality at the high and low ends of the range). >> The >> > question remains, what should that range really be? >> > >> > br, >> > mark >> > >> > On Wed, Nov 18, 2009 at 2:22 PM, Jan Richards >> <jan.richards@utoronto.ca> >> > wrote: >> > >> > >> > Hi Mark, >> > >> > I think we could be more consistent in the way these are stated. In >> one case >> > we say "1/3 to 3 times" implying 33%-300% and in another we say at >> least one >> > setting between 40% and 60% which implies a lowest setting of 60% >> would be >> > ok. >> > >> > Cheers, >> > Jan >> > >> > >> > Markku Hakkinen wrote: >> > >> > >> > 4.9.6 Playback Rate Adjustment for Multimedia Content. >> > >> > The user can adjust playback rate of prerecorded content containing >> > speech audio tracks such that all of the following are true (Level >> A): >> > >> > - The playback rate should be user adjustable between 1/3 and 3 >> times >> > real time of the recorded content. >> > >> > - Recorded speech, whose playback rate has been adjusted by the >> user, >> > should utilize pitch maintenance in order to avoid degradation of the >> > speech quality. >> > >> > If only a visual track is present, provide at least one setting >> > between 40% and 60% of the original speed. (Level A) >> > >> > When audio and video tracks are expected to be synchronized, >> > synchronization is maintained as long as they are played at 75% of >> the >> > original speed or higher. (Level A) >> > >> > The UA should provide a function that resets the playback rate to >> > normal (1x) . (Level A) >> > >> > >> > >> > -- >> > Jan Richards, M.Sc. >> > User Interface Design Lead >> > Adaptive Technology Resource Centre (ATRC) >> > Faculty of Information >> > University of Toronto >> > >> > Email: jan.richards@utoronto.ca >> > Web: http://jan.atrc.utoronto.ca >> > Phone: 416-946-7060 >> > Fax: 416-971-2896 >> > >> > >> > >> > >> > >> > ________________________________ >> > >> > No virus found in this incoming message. >> > Checked by AVG - www.avg.com >> > Version: 8.5.425 / Virus Database: 270.14.72/2511 - Release Date: >> 11/18/09 >> > 07:50:00 >> > >> > >> > >> > -- >> > ___________________________________________________ >> > >> > Kimberly Patch >> > President >> > Redstart Systems, Inc., makers of Utter Command >> > (617) 325-3966 >> > kim@redstartsystems.com >> > >> > www.redstartsystems.com >> > - making speech fly >> > >> > Patch on Speech blog >> > Redstart Systems on Twitter >> > ___________________________________________________ > > >
Received on Thursday, 19 November 2009 17:52:03 UTC