Re: ACTION-238 Revise guidelines 4.9.6

The better TSM algorithms handle slowing quite well, but it depends
again upon the quality of the source input. Most TSM algorithms I have
seen are designed to handle the full range of slow to fast. My guess
is that for Web-delivered audio, the level of compression will play a
negative role in the quality of slow speech.

mark

On Thu, Nov 19, 2009 at 12:05 PM, Jim Allan <allanj@tsbvi.edu> wrote:
> http://scholar.google.com/scholar?hl=en&q=playback%20rate%20comprehension&so
> urceid=navclient-ff&rlz=1B3GGGL_enUS261US261&um=1&ie=UTF-8&sa=N&tab=ws
>
> 50%-200% seems reasonable. Tho much below 75% my comprehension suffers.
> Slowing also suffers from pitch change. Are there pitch algorithms for
> slowing down?
>
>
>> -----Original Message-----
>> From: w3c-wai-ua-request@w3.org [mailto:w3c-wai-ua-request@w3.org] On
>> Behalf Of Markku Hakkinen
>> Sent: Thursday, November 19, 2009 10:03 AM
>> To: kim@redstartsystems.com
>> Cc: UAWG list
>> Subject: Re: ACTION-238 Revise guidelines 4.9.6
>>
>> The technical downside is that at higher rates the quality will most
>> likely be unintelligible to most listeners.  Assuming a typical
>> (English) reading rate of approximately 140 words per minute, 400% is
>> 560 words per minute.  Speech synthesizer users are reported to at
>> times use reading rates as high as 600 words per minute, but the
>> intelligibility depends much on the synthesizer technology, and at
>> that point it is really speech skimming in any case. Sure, experienced
>> users may push the technology, but is it right to set a high top end
>> that creates an expectation that the rate will be intelligible or
>> useful to more than just experts? And, from my experience, it is
>> easier to push synthesizers to higher rates than prerecorded speech.
>> I don't have (or recall) data that says we can push prerecorded speech
>> to 400% with any decent quality.
>>
>> With the pitch maintenance approach, the key point is to allow the
>> user to select intelligible speech playback rates, and for that
>> reason, that's why I think we see the 33%-300% range, which is what
>> most algorithms appear to handle in my experience. Outside of that
>> range, we can expect breakdown in terms of intelligibility.
>>
>> The use cases for speech slowdown are individuals with cognitive or
>> learning disabilities, older adults, and also new language learners.
>> Degradation in speech intelligibility at low rates would probably
>> outweigh the benefits of very slow speech for these groups.  Can
>> speech be slowed to 25% and still be intelligible?  Depends on the
>> encoding of the source audio and the quality of the TSM algorithm.
>> Again, I just don't see the data which tells me that 25% (or even 33%)
>> is an empirically based number.  50% may be adequate. I'm doing some
>> further digging on this.
>>
>> Perhaps we need to have 50% as A and 33% as AA.
>>
>> One more point.  As an international standard, I haven't looked at
>> data that tells me how effective TSM algorithms are across languages.
>> Do these presentation rates hold true across languages?  I'm look into
>> this, also.
>>
>> If anyone has pointers to specific research findings, please share.
>>
>> br,
>> mark
>>
>> On Wed, Nov 18, 2009 at 5:16 PM, Kim Patch <kim@redstartsystems.com>
>> wrote:
>> > A couple of things to maybe think about. Is there a technical
>> downside to
>> > wider versus narrower percentages, for instance 33-300% versus 25-
>> 400%?
>> >
>> > Second, if there's no downside, should guidelines like these be wide
>> enough
>> > to make sure to cover any possible situations, maybe even with a
>> discernible
>> > margin because sometimes when people have a tool that goes further
>> they use
>> > it? It's not uncommon for people who create technology to be
>> surprised at
>> > how far experienced users can push it when the technology allows them
>> to.
>> > The speed of some folks who use a single switch to control the
>> computer
>> > using scanning software comes to mind.
>> >
>> > Cheers,
>> > Kim
>> >
>> > Markku Hakkinen wrote:
>> >
>> > Hi Jan,
>> >
>> > Yes, they do need to be normalized (and then resolve precision,
>> 33.3%,
>> > if going with percentages).
>> >
>> > I put this out as a first pass at redrafting 4.9.6 to address both
>> > media slow down and speed up.  Discussion needed.
>> >
>> > What I am lacking is empirical data to back the upper and lower
>> limits
>> > for both speech and visual, and I am a hesitant to cast these numbers
>> > in stone (even when drawn from other standards, e.g., talking books)
>> > without being able to point to specific data saying why these numbers
>> > are optimal (de facto recommendations?).
>> >
>> > There is empirical data suggesting that speed up and slow down have
>> > benefit, but what I don't have is data saying what rates are ideal.
>> >
>> > I'll add that the software algorithms for time scale modification
>> used
>> > by both Windows Media Player and Quicktime player currently support
>> > the desired range (though the quality of the original encoding may
>> > affect perceived quality at the high and low ends of the range).
>> The
>> > question remains, what should that range really be?
>> >
>> > br,
>> > mark
>> >
>> > On Wed, Nov 18, 2009 at 2:22 PM, Jan Richards
>> <jan.richards@utoronto.ca>
>> > wrote:
>> >
>> >
>> > Hi Mark,
>> >
>> > I think we could be more consistent in the way these are stated. In
>> one case
>> > we say "1/3 to 3 times" implying 33%-300% and in another we say at
>> least one
>> > setting between 40% and 60% which implies a lowest setting of 60%
>> would be
>> > ok.
>> >
>> > Cheers,
>> > Jan
>> >
>> >
>> > Markku Hakkinen wrote:
>> >
>> >
>> > 4.9.6 Playback Rate Adjustment for Multimedia Content.
>> >
>> > The user can adjust playback rate of prerecorded content containing
>> > speech audio tracks such that all of the following are true (Level
>> A):
>> >
>> > -  The playback rate should be user adjustable between 1/3 and 3
>> times
>> > real time of the recorded content.
>> >
>> > -  Recorded speech, whose playback rate has been adjusted by the
>> user,
>> > should utilize pitch maintenance in order to avoid degradation of the
>> > speech quality.
>> >
>> > If only a visual track is present, provide at least one setting
>> > between 40% and 60% of the original speed. (Level A)
>> >
>> > When audio and video tracks are expected to be synchronized,
>> > synchronization is maintained as long as they are played at 75% of
>> the
>> > original speed or higher. (Level A)
>> >
>> > The UA should provide a function that resets the playback rate to
>> > normal (1x) . (Level A)
>> >
>> >
>> >
>> > --
>> > Jan Richards, M.Sc.
>> > User Interface Design Lead
>> > Adaptive Technology Resource Centre (ATRC)
>> > Faculty of Information
>> > University of Toronto
>> >
>> >  Email: jan.richards@utoronto.ca
>> >  Web:   http://jan.atrc.utoronto.ca
>> >  Phone: 416-946-7060
>> >  Fax:   416-971-2896
>> >
>> >
>> >
>> >
>> >
>> > ________________________________
>> >
>> > No virus found in this incoming message.
>> > Checked by AVG - www.avg.com
>> > Version: 8.5.425 / Virus Database: 270.14.72/2511 - Release Date:
>> 11/18/09
>> > 07:50:00
>> >
>> >
>> >
>> > --
>> > ___________________________________________________
>> >
>> > Kimberly Patch
>> > President
>> > Redstart Systems, Inc., makers of Utter Command
>> > (617) 325-3966
>> > kim@redstartsystems.com
>> >
>> > www.redstartsystems.com
>> > - making speech fly
>> >
>> > Patch on Speech blog
>> > Redstart Systems on Twitter
>> > ___________________________________________________
>
>
>

Received on Thursday, 19 November 2009 17:52:03 UTC