- From: <Jesper.Olsen@nokia.com>
- Date: Fri, 4 Jan 2002 22:00:13 +0200
- To: ranjansharma@lucent.com, www-voice@w3.org
The noise/speech decision can not be perfect, but many types of noise nevertheless look a lot different from speech, even if filtered to telephone bandwidth. The noise/speech decision is made by a voice activity detector, and not by the ASR engine itself (except of course for the recognition based barge-in decision). Any decent voice activity detector today will be doing some kind of spectrum analysis. The raw energy of signal gives poor results in the presence of noise. So yes, these ASR engines are available. Jesper > -----Original Message----- > From: ext Sharma, Ranjan (Ranjan) [mailto:ranjansharma@lucent.com] > Sent: 04 January, 2002 21:50 > To: Olsen Jesper (NRC/Helsinki); www-voice@w3.org > Subject: RE: Barge-in types in VoiceXML > > > Given that the telephone communication restricts the bandwidth to > roughly 3 kHz and the full spectrum is therefore not present for > analysis, would it not be hard for the DSP involved to do a spectrum > analysis and determine if the energy is from noise or speech? > > I am sure noise frequencies would overlap with speech and to a certain > extent, there is speaker dependence involved when doing the analysis - > people have different voices. > > Is there an ASR engine available today that makes the distinction > between the "energy" and "speech"? > > The recognition based barge-in could comparatively be slower > and consume > more processing power, but would avoid the barge-in due to background > noise. Anyway, that was not my focus. > > Thanks, > Ranjan >
Received on Friday, 4 January 2002 15:00:21 UTC