RE: ACTION-238 Revise guidelines 4.9.6 from Jim Allan on 2009-11-19 (w3c-wai-ua@w3.org from October to December 2009)

From: Jim Allan <allanj@tsbvi.edu>
Date: Thu, 19 Nov 2009 11:05:19 -0600
To: "'Markku Hakkinen'" <markku.hakkinen@gmail.com>, <kim@redstartsystems.com>
Cc: "'UAWG list'" <w3c-wai-ua@w3.org>
Message-ID: <008c01ca693a$7a65ec00$6f31c400$@edu>
http://scholar.google.com/scholar?hl=en&q=playback%20rate%20comprehension&so
urceid=navclient-ff&rlz=1B3GGGL_enUS261US261&um=1&ie=UTF-8&sa=N&tab=ws 

50%-200% seems reasonable. Tho much below 75% my comprehension suffers.
Slowing also suffers from pitch change. Are there pitch algorithms for
slowing down?


> -----Original Message-----
> From: w3c-wai-ua-request@w3.org [mailto:w3c-wai-ua-request@w3.org] On
> Behalf Of Markku Hakkinen
> Sent: Thursday, November 19, 2009 10:03 AM
> To: kim@redstartsystems.com
> Cc: UAWG list
> Subject: Re: ACTION-238 Revise guidelines 4.9.6
> 
> The technical downside is that at higher rates the quality will most
> likely be unintelligible to most listeners.  Assuming a typical
> (English) reading rate of approximately 140 words per minute, 400% is
> 560 words per minute.  Speech synthesizer users are reported to at
> times use reading rates as high as 600 words per minute, but the
> intelligibility depends much on the synthesizer technology, and at
> that point it is really speech skimming in any case. Sure, experienced
> users may push the technology, but is it right to set a high top end
> that creates an expectation that the rate will be intelligible or
> useful to more than just experts? And, from my experience, it is
> easier to push synthesizers to higher rates than prerecorded speech.
> I don't have (or recall) data that says we can push prerecorded speech
> to 400% with any decent quality.
> 
> With the pitch maintenance approach, the key point is to allow the
> user to select intelligible speech playback rates, and for that
> reason, that's why I think we see the 33%-300% range, which is what
> most algorithms appear to handle in my experience. Outside of that
> range, we can expect breakdown in terms of intelligibility.
> 
> The use cases for speech slowdown are individuals with cognitive or
> learning disabilities, older adults, and also new language learners.
> Degradation in speech intelligibility at low rates would probably
> outweigh the benefits of very slow speech for these groups.  Can
> speech be slowed to 25% and still be intelligible?  Depends on the
> encoding of the source audio and the quality of the TSM algorithm.
> Again, I just don't see the data which tells me that 25% (or even 33%)
> is an empirically based number.  50% may be adequate. I'm doing some
> further digging on this.
> 
> Perhaps we need to have 50% as A and 33% as AA.
> 
> One more point.  As an international standard, I haven't looked at
> data that tells me how effective TSM algorithms are across languages.
> Do these presentation rates hold true across languages?  I'm look into
> this, also.
> 
> If anyone has pointers to specific research findings, please share.
> 
> br,
> mark
> 
> On Wed, Nov 18, 2009 at 5:16 PM, Kim Patch <kim@redstartsystems.com>
> wrote:
> > A couple of things to maybe think about. Is there a technical
> downside to
> > wider versus narrower percentages, for instance 33-300% versus 25-
> 400%?
> >
> > Second, if there's no downside, should guidelines like these be wide
> enough
> > to make sure to cover any possible situations, maybe even with a
> discernible
> > margin because sometimes when people have a tool that goes further
> they use
> > it? It's not uncommon for people who create technology to be
> surprised at
> > how far experienced users can push it when the technology allows them
> to.
> > The speed of some folks who use a single switch to control the
> computer
> > using scanning software comes to mind.
> >
> > Cheers,
> > Kim
> >
> > Markku Hakkinen wrote:
> >
> > Hi Jan,
> >
> > Yes, they do need to be normalized (and then resolve precision,
> 33.3%,
> > if going with percentages).
> >
> > I put this out as a first pass at redrafting 4.9.6 to address both
> > media slow down and speed up.  Discussion needed.
> >
> > What I am lacking is empirical data to back the upper and lower
> limits
> > for both speech and visual, and I am a hesitant to cast these numbers
> > in stone (even when drawn from other standards, e.g., talking books)
> > without being able to point to specific data saying why these numbers
> > are optimal (de facto recommendations?).
> >
> > There is empirical data suggesting that speed up and slow down have
> > benefit, but what I don't have is data saying what rates are ideal.
> >
> > I'll add that the software algorithms for time scale modification
> used
> > by both Windows Media Player and Quicktime player currently support
> > the desired range (though the quality of the original encoding may
> > affect perceived quality at the high and low ends of the range).
> The
> > question remains, what should that range really be?
> >
> > br,
> > mark
> >
> > On Wed, Nov 18, 2009 at 2:22 PM, Jan Richards
> <jan.richards@utoronto.ca>
> > wrote:
> >
> >
> > Hi Mark,
> >
> > I think we could be more consistent in the way these are stated. In
> one case
> > we say "1/3 to 3 times" implying 33%-300% and in another we say at
> least one
> > setting between 40% and 60% which implies a lowest setting of 60%
> would be
> > ok.
> >
> > Cheers,
> > Jan
> >
> >
> > Markku Hakkinen wrote:
> >
> >
> > 4.9.6 Playback Rate Adjustment for Multimedia Content.
> >
> > The user can adjust playback rate of prerecorded content containing
> > speech audio tracks such that all of the following are true (Level
> A):
> >
> > -  The playback rate should be user adjustable between 1/3 and 3
> times
> > real time of the recorded content.
> >
> > -  Recorded speech, whose playback rate has been adjusted by the
> user,
> > should utilize pitch maintenance in order to avoid degradation of the
> > speech quality.
> >
> > If only a visual track is present, provide at least one setting
> > between 40% and 60% of the original speed. (Level A)
> >
> > When audio and video tracks are expected to be synchronized,
> > synchronization is maintained as long as they are played at 75% of
> the
> > original speed or higher. (Level A)
> >
> > The UA should provide a function that resets the playback rate to
> > normal (1x) . (Level A)
> >
> >
> >
> > --
> > Jan Richards, M.Sc.
> > User Interface Design Lead
> > Adaptive Technology Resource Centre (ATRC)
> > Faculty of Information
> > University of Toronto
> >
> >  Email: jan.richards@utoronto.ca
> >  Web:   http://jan.atrc.utoronto.ca
> >  Phone: 416-946-7060
> >  Fax:   416-971-2896
> >
> >
> >
> >
> >
> > ________________________________
> >
> > No virus found in this incoming message.
> > Checked by AVG - www.avg.com
> > Version: 8.5.425 / Virus Database: 270.14.72/2511 - Release Date:
> 11/18/09
> > 07:50:00
> >
> >
> >
> > --
> > ___________________________________________________
> >
> > Kimberly Patch
> > President
> > Redstart Systems, Inc., makers of Utter Command
> > (617) 325-3966
> > kim@redstartsystems.com
> >
> > www.redstartsystems.com
> > - making speech fly
> >
> > Patch on Speech blog
> > Redstart Systems on Twitter
> > ___________________________________________________
Received on Thursday, 19 November 2009 17:06:05 UTC