W3C home > Mailing lists > Public > www-voice@w3.org > October to December 2005

Re: Voice Recognition Profiles

From: Al Gilman <Alfred.S.Gilman@IEEE.org>
Date: Fri, 11 Nov 2005 15:34:30 -0500
Message-Id: <p06110405bf9aaace3487@[]>
To: "B.K. DeLong" <bkdelong@pobox.com>
Cc: www-voice@w3.org

What Paolo says is what I hear from others:

Speech recognition performance is
- hotly competitive, because it is
- marginally acceptable

This also means that training against a 'vanilla' corpus of training
texts would probably be, to a competitive-sensitive degree, more
tedious and less effective than training on a corpus attuned to the
technology that you are training.

For basic controls, your GPS can use 'telephony ASR' without
training. That is to say to turn it on, change views, and do simple
things like "pan North." So long as the domain of discourse is
compact enough, it is selecting among a small set of valid catches
and it doesn't need to be trained.

General route planning from voice catches could get hairy.  I don't
know how near or far off that is.  We'll have to watch what comes
on the market.

How can I say: it's not that hard to make untrained ASR competitive
with the level of fussiness in the 'destination input' function of
contemporary free Web map services. These often take two or three
tries to reduce my input to a form they can recognize. Not natural
conversation, but voice competitive with other input modes. Other
than perhaps an inked 'X' or lasso on the graphic map.

To enter by free speech a destination that you want to go to, your
in-the-car GPS might access a network-hosted GIS reference
service behind it. But it would need it's own resident map because
when you're lost, of course that's when your network connection
fades out.

Another network-collaboration scenario we have discussed,
inspired by the needs of speakers with atypical speech is that
the [internet-connected] Voice Browser could, when required,
outsource the speech recognition to a Web Service hosting
a speech recognition technology that you have trained.  The
MRCP technology is a candidate to handle the outsourcing



At 11:52 AM +0100 11/11/05, Baggia Paolo wrote:
>Dear ..,
>I'd like to give you some more information on the background
>of your proposals.
>There are at least two broad classes of ASR:
>- telephony ASR
>- dictation ASR
>The former does not require any kind of training, because it is
>designed to be used by all possible speakers of a given language,
>so the ASR is using a general acoustic model trained on a large
>population of speakers.
>Conversely the latter is for a personal use, so the training
>is used for improving the performances on given speaker. Even in
>this field from a very long training session (reading predefined
>sentences) the current version of dictation ASR are using general
>acoustic models as a baseline, so the training needed is reduced.
>For telephony ASR there are approaches to adapt online the acoustic
>models to improve the performance of the actual speaker. This is done
>during the course of the speech interaction, without the need of
>an explicit training phase.
>A second aspect is that it is very premature to speak of a
>Voice Recognition Profile today. All the technologies are different
>so it is almost impossible to hava a standard profile, but your
>idea is in principle good.
>THis is my personal opinion,
>Paolo Baggia, Loquendo.
>Voice Recognition Profiles
>This message: [ Message body ] [ Respond ] [ More options ]
>Related messages: [ Next message ] [ Previous message ]
>From: B.K. DeLong
>Date: Fri, 28 Oct 2005 08:26:32 -0400
>To: www-voice@w3.org
>I'm not sure if this is the right place to discuss this - I looked
>through the archives of this list and several TRs from the Voice
>activity and didn't really find anything to answer my question.
>Have any efforts been made to make a standard for voice recognition
>training profiles? Is "training" even necessary any more for voice
>recognition systems?
>So when I load up a voice recognition program, I am told to read
>several lines or paragraphs of text so it can match the text content
>with my voice. For every program I try, I have to retrain it all over
>again. In theory, if I move from my computer to my car and try to
>activate my GPS system by voice, it needs to be trained. If I go to
>an ATM or drive-thru where one can automatically order by voice, I
>need to spend several minutes correcting the system until I'm
>connected with a human operator.
>Why not create a standard profile for voice recognition that all
>voice-recognition applications can use? That way, when I come to a
>new system I need to "train", I just type in my SSN or some other UID
>which tells the system to pull my VRP (Voice Recognition Profile),
>out of a centralized directory service, allowing me to immediately
>use the system.
>In theory, each time I access a new service, whatever actions I take
>and corrections I make in the process, would be noted in the file for
>the next time I access a service - a live, constantly-growing,
>learning profile.
>Does such a standard or technology effort exist?
>B.K. DeLong
>+1.617.797.8471 (Note new number)
>http://www.brain-stream.com Play.
>http://www.bostonredcross.org Volunteer.
>http://www.the-leaky-cauldron.org Potter.
>http://www.hackerfoundation.org Future.
>http://www.wkdelong.org Son.
>PGP Fingerprint:
>38D4 D4D4 5819 8667 DFD5 A62D AF61 15FF 297D 67FE
>Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia S.p.A.
>This message and its attachments are addressed solely to the persons
>above and may contain confidential information. If you have received
>the message in error, be informed that any use of the content hereof
>is prohibited. Please return it immediately to the sender and delete
>the message. Should you have any questions, please send an e_mail to
>Thank you
Received on Friday, 11 November 2005 22:05:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:07:38 UTC