- From: McGlashan, Scott <scott.mcglashan@hp.com>
- Date: Sun, 14 Dec 2003 17:43:02 +0100
- To: "Guillaume Berche" <guillaume.berche@eloquant.com>
- Cc: <www-voice@w3.org>
Guillaume, Thank you again for your timely response and your acceptance of our disposition on these issues. On your one remaining issue, CR5-13. We propose the following revised resolution. CR5-13 accepted with modifications We believe that when recording begins is clearly defined: in Section 2.3.6, it states: "A recording begins at the earliest after the playback of any prompts (including the 'beep' tone if defined). As an optimization, a platform may begin recording when the user starts speaking." i.e. the recording may include initial silence, etc if the platform does not use the optimization (e.g. voice activity detection). With the optimization, the recording can begin with the user's speech. Whether music or other audio triggers voice activity detection is platform-specific. Note that this behavior applies independent of whether speech recognition is supported (while the recording and recognition processes use the same audio data stream, theese processes are independent and therefore their voice activity detection mechanism may be different). The timeout interval is clearly defined: "A timeout interval is defined to begin immediately after prompt playback (including the 'beep' tone if defined) and its duration is determined by the 'timeout' property." The timeout interval has an effect on both recording and recognition (which are logically independent). For recording, the impact is specified in "If the timeout interval is exceeded before recording begins, then a <noinput> event is thrown." In the case of non-optimized recording, recording always begins after prompt playback, so <noinput> would never be thrown. With optimized recording, however, <noinput> may be thrown if no voice activity is detected before timeout interval elapses. For recognition, the situation is more complex. We are modifying the specification (due to implementation report feedback) so that if recognition is supported during recording (this is an optional feature), then only non-local speech grammars are active. If a non-local speech grammar is matched by audio input, then execution is immediately transferred its enclosing element. This raises the issue of whether a <noinput> or <nomatch> could be thrown by the recognition process. A <noinput> could be generated if the timeout interval has elapsed. A <nomatch> could be generated if the audio triggers recognition but does not match the active grammar. Our belief is that throwing these events by the recognition process during recording is undesirable and not what VoiceXML authors expect. Consequently, we are considered clarifying the specification to make it clear that <noinput> and <nomatch> events are never thrown from the recognition process during recording. Guillaume, please let us know whether you accept this disposition. If you do not explicit require the clarification concerning the throwing of <noinput> and <nomatch> events by recognition during recording, the group will use its discretion in whether the clarification needs to be applied. Thanks Scott
Received on Sunday, 14 December 2003 11:43:04 UTC