- From: Shires, Glen <glen.shires@intel.com>
- Date: Fri, 11 Nov 2005 14:34:03 -0500
- To: <www-voice@w3.org>
A simple way to address the different technologies used for profiles is to store the voice samples as plain audio. For example, by standardizing on a common training text (e.g. a few paragraphs in the public domain), and asking users to make a high-quality recording of it in a standardized way, then this audio could be used as input to virtually any speech recognition system for training. As an example, the recording could be standardized to be 24kHz sampling rate, 16-bits/sample and stored in a specific non-lossy format and recorded through a specified near-field microphone. Speech recognition systems could then process this audio to match the input characteristics of their own system; for example, mimicking properties of their microphone and environment and re-sampling to a different sampling rate. Thus, the same audio samples could be used for training virtually any speech recognition system. For example, they could be recorded on a PC using a standardized application, then uploaded to a central web-site and downloaded by other devices that you use. This is my personal opinion, Glen Shires ________________________________________ From: www-voice-request@w3.org [mailto:www-voice-request@w3.org] On Behalf Of Baggia Paolo Sent: Friday, November 11, 2005 2:53 AM To: B.K. DeLong Cc: Baggia Paolo; www-voice@w3.org Subject: Re: Voice Recognition Profiles Dear .., I'd like to give you some more information on the background of your proposals. There are at least two broad classes of ASR: - telephony ASR - dictation ASR The former does not require any kind of training, because it is designed to be used by all possible speakers of a given language, so the ASR is using a general acoustic model trained on a large population of speakers. Conversely the latter is for a personal use, so the training is used for improving the performances on given speaker. Even in this field from a very long training session (reading predefined sentences) the current version of dictation ASR are using general acoustic models as a baseline, so the training needed is reduced. For telephony ASR there are approaches to adapt online the acoustic models to improve the performance of the actual speaker. This is done during the course of the speech interaction, without the need of an explicit training phase. A second aspect is that it is very premature to speak of a Voice Recognition Profile today. All the technologies are different so it is almost impossible to hava a standard profile, but your idea is in principle good. THis is my personal opinion, Paolo Baggia, Loquendo. ==================================================================== Voice Recognition Profiles This message: [ Message body ] [ Respond ] [ More options ] Related messages: [ Next message ] [ Previous message ] From: B.K. DeLong Date: Fri, 28 Oct 2005 08:26:32 -0400 Message-Id: To: www-voice@w3.org I'm not sure if this is the right place to discuss this - I looked through the archives of this list and several TRs from the Voice activity and didn't really find anything to answer my question. Have any efforts been made to make a standard for voice recognition training profiles? Is "training" even necessary any more for voice recognition systems? So when I load up a voice recognition program, I am told to read several lines or paragraphs of text so it can match the text content with my voice. For every program I try, I have to retrain it all over again. In theory, if I move from my computer to my car and try to activate my GPS system by voice, it needs to be trained. If I go to an ATM or drive-thru where one can automatically order by voice, I need to spend several minutes correcting the system until I'm connected with a human operator. Why not create a standard profile for voice recognition that all voice-recognition applications can use? That way, when I come to a new system I need to "train", I just type in my SSN or some other UID which tells the system to pull my VRP (Voice Recognition Profile), out of a centralized directory service, allowing me to immediately use the system. In theory, each time I access a new service, whatever actions I take and corrections I make in the process, would be noted in the file for the next time I access a service - a live, constantly-growing, learning profile. Does such a standard or technology effort exist? -- B.K. DeLong bkdelong@pobox.com +1.617.797.8471 (Note new number) http://www.brain-stream.com Play. http://www.bostonredcross.org Volunteer. http://www.the-leaky-cauldron.org Potter. http://www.hackerfoundation.org Future. http://www.wkdelong.org Son. PGP Fingerprint: 38D4 D4D4 5819 8667 DFD5 A62D AF61 15FF 297D 67FE FOAF: http://foaf.brain-stream.org Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia S.p.A. ================================================ CONFIDENTIALITY NOTICE This message and its attachments are addressed solely to the persons above and may contain confidential information. If you have received the message in error, be informed that any use of the content hereof is prohibited. Please return it immediately to the sender and delete the message. Should you have any questions, please send an e_mail to <mailto:webmaster@telecomitalia.it>webmaster@telecomitalia.it. Thank you <http://www.loquendo.com>www.loquendo.com ================================================
Received on Friday, 11 November 2005 19:34:24 UTC