W3C home > Mailing lists > Public > www-voice@w3.org > July to September 2004

Petition to include real time adjustment of playback speed for audio files in VoiceXML Version 3

From: Daniel O'Sullivan <dan@voicexl.com>
Date: Wed, 21 Jul 2004 15:04:12 -0500
To: <www-voice@w3.org>
Message-ID: <BD24366C.15F6%dan@voicexl.com>
Dear Committee Members:

I understand from Jim Larsen that work is to begin in the near future on a
specification for VoiceXML Version 3. I further understand that the
Committee is interested in incorporating new features for the VoiceXML
standard in that specification.

I strongly urge the committee to select audio playback speed adjustment as
one of those new features for  the VoiceXML Version 3 specification for the
following reasons:

1. While the  SSML and SALT specifications both support dynamic playback
speed adjustment of TTS messages, there is currently no equivalent in
VoiceXML. Further, not all TTS engines provide a smooth playback output with
variable playback rates. Dynamic adjustment of existing pre-recorded audio
files on the other hand provides a very smooth, high quality, pitch adjusted
output.

2. The feature is already supported at the hardware and API level by the
dominant voice board manufacturers including Intel-Dialogic and NMSS.

3. The feature allows for convenient tuning of an applications voice files
in real time, even long after the application has been tested in production
for many years.

4. The feature would make it easier for the VoiceXML developer community to
implement  our real time adaptive algorithm which has been field tested and
proven to enhance the caller interface, shorten call duration and encourage
the use of IVR resources. A white paper on our technology as currently
implemented for VoiceXML platforms is attached to this email.

The new feature could be added as a simple tag for audio play events. The
tag would specify whether the message segment is to be played at normal or
some positive (increase) or negative (decrease) value with respect to the
static, recorded playback speed of the segment.


There are a variety of alternative technologies to support this feature, all
of which are available royalty free in the public domain. These include:

a) Adding "power user" menus to allow callers to select the level of
instruction they receive from the IVR application.

b)  off-line editing of voice files to reduce silence at the beginning and
end of each message segment.

c)  redesign of the application call script to maximize efficiency and
reduce ambiguities.


I would be happy to answer any questions the committee members may have
regarding this feature and how it will benefit the VoiceXML community as a
whole.

Thank you for considering this proposal.

Sincerely,

Daniel OšSullivan
President/CEO
Interactive Digital, Inc.
dan@voicexl.com 
www.voicexl.com 
(631) 724-2323 direct
(631) 680-4307 mobile





Received on Wednesday, 21 July 2004 15:05:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:14:26 UTC