Re: ACTION-238 Revise guidelines 4.9.6 from Markku Hakkinen on 2009-11-19 (w3c-wai-ua@w3.org from October to December 2009)

From: Markku Hakkinen <markku.hakkinen@gmail.com>
Date: Thu, 19 Nov 2009 11:02:35 -0500
To: kim@redstartsystems.com
Cc: UAWG list <w3c-wai-ua@w3.org>
Message-ID: <3dcaabaf0911190802t65037927la41eff3a1241010d@mail.gmail.com>
The technical downside is that at higher rates the quality will most
likely be unintelligible to most listeners.  Assuming a typical
(English) reading rate of approximately 140 words per minute, 400% is
560 words per minute.  Speech synthesizer users are reported to at
times use reading rates as high as 600 words per minute, but the
intelligibility depends much on the synthesizer technology, and at
that point it is really speech skimming in any case. Sure, experienced
users may push the technology, but is it right to set a high top end
that creates an expectation that the rate will be intelligible or
useful to more than just experts? And, from my experience, it is
easier to push synthesizers to higher rates than prerecorded speech.
I don't have (or recall) data that says we can push prerecorded speech
to 400% with any decent quality.

With the pitch maintenance approach, the key point is to allow the
user to select intelligible speech playback rates, and for that
reason, that's why I think we see the 33%-300% range, which is what
most algorithms appear to handle in my experience. Outside of that
range, we can expect breakdown in terms of intelligibility.

The use cases for speech slowdown are individuals with cognitive or
learning disabilities, older adults, and also new language learners.
Degradation in speech intelligibility at low rates would probably
outweigh the benefits of very slow speech for these groups.  Can
speech be slowed to 25% and still be intelligible?  Depends on the
encoding of the source audio and the quality of the TSM algorithm.
Again, I just don't see the data which tells me that 25% (or even 33%)
is an empirically based number.  50% may be adequate. I'm doing some
further digging on this.

Perhaps we need to have 50% as A and 33% as AA.

One more point.  As an international standard, I haven't looked at
data that tells me how effective TSM algorithms are across languages.
Do these presentation rates hold true across languages?  I'm look into
this, also.

If anyone has pointers to specific research findings, please share.

br,
mark

On Wed, Nov 18, 2009 at 5:16 PM, Kim Patch <kim@redstartsystems.com> wrote:
> A couple of things to maybe think about. Is there a technical downside to
> wider versus narrower percentages, for instance 33-300% versus 25-400%?
>
> Second, if there's no downside, should guidelines like these be wide enough
> to make sure to cover any possible situations, maybe even with a discernible
> margin because sometimes when people have a tool that goes further they use
> it? It's not uncommon for people who create technology to be surprised at
> how far experienced users can push it when the technology allows them to.
> The speed of some folks who use a single switch to control the computer
> using scanning software comes to mind.
>
> Cheers,
> Kim
>
> Markku Hakkinen wrote:
>
> Hi Jan,
>
> Yes, they do need to be normalized (and then resolve precision, 33.3%,
> if going with percentages).
>
> I put this out as a first pass at redrafting 4.9.6 to address both
> media slow down and speed up.  Discussion needed.
>
> What I am lacking is empirical data to back the upper and lower limits
> for both speech and visual, and I am a hesitant to cast these numbers
> in stone (even when drawn from other standards, e.g., talking books)
> without being able to point to specific data saying why these numbers
> are optimal (de facto recommendations?).
>
> There is empirical data suggesting that speed up and slow down have
> benefit, but what I don't have is data saying what rates are ideal.
>
> I'll add that the software algorithms for time scale modification used
> by both Windows Media Player and Quicktime player currently support
> the desired range (though the quality of the original encoding may
> affect perceived quality at the high and low ends of the range).   The
> question remains, what should that range really be?
>
> br,
> mark
>
> On Wed, Nov 18, 2009 at 2:22 PM, Jan Richards <jan.richards@utoronto.ca>
> wrote:
>
>
> Hi Mark,
>
> I think we could be more consistent in the way these are stated. In one case
> we say "1/3 to 3 times" implying 33%-300% and in another we say at least one
> setting between 40% and 60% which implies a lowest setting of 60% would be
> ok.
>
> Cheers,
> Jan
>
>
> Markku Hakkinen wrote:
>
>
> 4.9.6 Playback Rate Adjustment for Multimedia Content.
>
> The user can adjust playback rate of prerecorded content containing
> speech audio tracks such that all of the following are true (Level A):
>
> -  The playback rate should be user adjustable between 1/3 and 3 times
> real time of the recorded content.
>
> -  Recorded speech, whose playback rate has been adjusted by the user,
> should utilize pitch maintenance in order to avoid degradation of the
> speech quality.
>
> If only a visual track is present, provide at least one setting
> between 40% and 60% of the original speed. (Level A)
>
> When audio and video tracks are expected to be synchronized,
> synchronization is maintained as long as they are played at 75% of the
> original speed or higher. (Level A)
>
> The UA should provide a function that resets the playback rate to
> normal (1x) . (Level A)
>
>
>
> --
> Jan Richards, M.Sc.
> User Interface Design Lead
> Adaptive Technology Resource Centre (ATRC)
> Faculty of Information
> University of Toronto
>
>  Email: jan.richards@utoronto.ca
>  Web:   http://jan.atrc.utoronto.ca
>  Phone: 416-946-7060
>  Fax:   416-971-2896
>
>
>
>
>
> ________________________________
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.425 / Virus Database: 270.14.72/2511 - Release Date: 11/18/09
> 07:50:00
>
>
>
> --
> ___________________________________________________
>
> Kimberly Patch
> President
> Redstart Systems, Inc., makers of Utter Command
> (617) 325-3966
> kim@redstartsystems.com
>
> www.redstartsystems.com
> - making speech fly
>
> Patch on Speech blog
> Redstart Systems on Twitter
> ___________________________________________________
Received on Thursday, 19 November 2009 16:03:35 UTC