RE: VoiceXML 2.0: Official Response #1 to Candidate Recommendation Issues


Thanks for your detailed response concerning the point on recording. This
clarification certainly helps in understanding the specs on this point.

> Guillaume, please let us know whether you accept this disposition. If
> you do not explicit require the clarification concerning the throwing of
> <noinput> and <nomatch> events by recognition during recording, the
> group will use its discretion in whether the clarification needs to be
> applied.

I am indeed satisfied by this disposition, and do not explicit require the
clarification concerning the throwing of <noinput> and <nomatch> events by
recognition during recording given that this feature is optional, seldomly
supported. More over your explanation is quite clear to me and publicly

Thanks again and best regards,


> Guillaume,
> Thank you again for your timely response and your acceptance of our
> disposition on these issues.
> On your one remaining issue, CR5-13. We propose the following revised
> resolution.
> CR5-13 accepted with modifications
> We believe that when recording begins is clearly defined: in Section
> 2.3.6, it states:
> "A recording begins at the earliest after the playback of any prompts
> (including the 'beep' tone if defined). As an optimization, a platform
> may begin recording when the user starts speaking."
> i.e. the recording may include initial silence, etc if the platform does
> not use the optimization (e.g. voice activity detection). With the
> optimization, the recording can begin with the user's speech. Whether
> music or other audio triggers voice activity detection is
> platform-specific. Note that this behavior applies independent of
> whether speech recognition is supported (while the recording and
> recognition processes use the same audio data stream, theese processes
> are independent and therefore their voice activity detection mechanism
> may be different).
> The timeout interval is clearly defined: "A timeout interval is defined
> to begin immediately after prompt playback (including the 'beep' tone if
> defined) and its duration is determined by the 'timeout' property."
> The timeout interval has an effect on both recording and recognition
> (which are logically independent).
> For recording, the impact is specified in "If the timeout interval is
> exceeded before recording begins, then a <noinput> event is thrown." In
> the case of non-optimized recording, recording always begins after
> prompt playback, so <noinput> would never be thrown. With optimized
> recording, however, <noinput> may be thrown if no voice activity is
> detected before timeout interval elapses.
> For recognition, the situation is more complex. We are modifying the
> specification (due to implementation report feedback) so that if
> recognition is supported during recording (this is an optional feature),
> then only non-local speech grammars are active. If a non-local speech
> grammar is matched by audio input, then execution is immediately
> transferred its enclosing element. This raises the issue of whether a
> <noinput> or <nomatch> could be thrown by the recognition process. A
> <noinput> could be generated if the timeout interval has elapsed. A
> <nomatch> could be generated if the audio triggers recognition but does
> not match the active grammar. Our belief is that throwing these events
> by the recognition process during recording is undesirable and not what
> VoiceXML authors expect. Consequently, we are considered clarifying the
> specification to make it clear that <noinput> and <nomatch> events are
> never thrown from the recognition process during recording.
> Guillaume, please let us know whether you accept this disposition. If
> you do not explicit require the clarification concerning the throwing of
> <noinput> and <nomatch> events by recognition during recording, the
> group will use its discretion in whether the clarification needs to be
> applied.
> Thanks
> Scott

