RE: Barge-in types in VoiceXML from Jesper.Olsen@nokia.com on 2002-01-04 (www-voice@w3.org from January to March 2002)

From: <Jesper.Olsen@nokia.com>
Date: Fri, 4 Jan 2002 22:00:13 +0200
To: ranjansharma@lucent.com, www-voice@w3.org
Message-ID: <58E9549287153543B1D95B91C332BD73680608@esebe016.NOE.Nokia.com>

The noise/speech decision can not be perfect, but many types of noise
nevertheless
look a lot different from speech, even if filtered to telephone
bandwidth.

The noise/speech decision is made by a voice activity detector, and not
by the ASR engine itself (except of course for the recognition based
barge-in decision).

Any decent voice activity detector today will be doing some kind of
spectrum analysis.
The raw energy of signal gives poor results in the presence of noise.
So yes, these ASR engines are available.

Jesper


> -----Original Message-----
> From: ext Sharma, Ranjan (Ranjan) [mailto:ranjansharma@lucent.com]
> Sent: 04 January, 2002 21:50
> To: Olsen Jesper (NRC/Helsinki); www-voice@w3.org
> Subject: RE: Barge-in types in VoiceXML
> 
> 
> Given that the telephone communication restricts the bandwidth to
> roughly 3 kHz and the full spectrum is therefore not present for
> analysis, would it not be hard for the DSP involved to do a spectrum
> analysis and determine if the energy is from noise or speech?
> 
> I am sure noise frequencies would overlap with speech and to a certain
> extent, there is speaker dependence involved when doing the analysis -
> people have different voices.
> 
> Is there an ASR engine available today that makes the distinction 
> between the "energy" and "speech"?
> 
> The recognition based barge-in could comparatively be slower 
> and consume
> more processing power, but would avoid the barge-in due to background
> noise. Anyway, that was not my focus.
> 
> Thanks,
> Ranjan
>

Received on Friday, 4 January 2002 15:00:21 UTC