- From: McGlashan, Scott <scott.mcglashan@hp.com>
- Date: Mon, 15 Dec 2003 12:32:53 +0100
- To: "Dean Sturtevant" <deansturtevant@comcast.net>
- Cc: <www-voice@w3.org>
DTMF local grammars can still be specified - so you can terminate the recording with specific DTMF sequences, etc. Local speech grammars were not supported by any implementations during the implementation report phase, hence the decision to withdraw this (optional) feature. Scott McGlashan Co-chair, W3C VBWG -----Original Message----- From: Dean Sturtevant [mailto:deansturtevant@comcast.net] Sent: Sunday, December 14, 2003 18:16 To: McGlashan, Scott; Guillaume Berche Cc: www-voice@w3.org Subject: Re: VoiceXML 2.0: Official Response #1 to Candidate Recommendation Issues Scott, I apologize if I'm out of line in responding to this, but I am puzzled by the proposed change to the specification. If only non-local speech grammars are active during a recording, what is the purpose of specifying a local grammar? In fact, why restrict the set of active grammars at all in this case? (why no DTMF? why no local grammars?) - Dean ----- Original Message ----- From: "McGlashan, Scott" <scott.mcglashan@hp.com> To: "Guillaume Berche" <guillaume.berche@eloquant.com> Cc: <www-voice@w3.org> Sent: Sunday, December 14, 2003 11:43 AM Subject: RE: VoiceXML 2.0: Official Response #1 to Candidate Recommendation Issues Guillaume, Thank you again for your timely response and your acceptance of our disposition on these issues. On your one remaining issue, CR5-13. We propose the following revised resolution. CR5-13 accepted with modifications We believe that when recording begins is clearly defined: in Section 2.3.6, it states: "A recording begins at the earliest after the playback of any prompts (including the 'beep' tone if defined). As an optimization, a platform may begin recording when the user starts speaking." i.e. the recording may include initial silence, etc if the platform does not use the optimization (e.g. voice activity detection). With the optimization, the recording can begin with the user's speech. Whether music or other audio triggers voice activity detection is platform-specific. Note that this behavior applies independent of whether speech recognition is supported (while the recording and recognition processes use the same audio data stream, theese processes are independent and therefore their voice activity detection mechanism may be different). The timeout interval is clearly defined: "A timeout interval is defined to begin immediately after prompt playback (including the 'beep' tone if defined) and its duration is determined by the 'timeout' property." The timeout interval has an effect on both recording and recognition (which are logically independent). For recording, the impact is specified in "If the timeout interval is exceeded before recording begins, then a <noinput> event is thrown." In the case of non-optimized recording, recording always begins after prompt playback, so <noinput> would never be thrown. With optimized recording, however, <noinput> may be thrown if no voice activity is detected before timeout interval elapses. For recognition, the situation is more complex. We are modifying the specification (due to implementation report feedback) so that if recognition is supported during recording (this is an optional feature), then only non-local speech grammars are active. If a non-local speech grammar is matched by audio input, then execution is immediately transferred its enclosing element. This raises the issue of whether a <noinput> or <nomatch> could be thrown by the recognition process. A <noinput> could be generated if the timeout interval has elapsed. A <nomatch> could be generated if the audio triggers recognition but does not match the active grammar. Our belief is that throwing these events by the recognition process during recording is undesirable and not what VoiceXML authors expect. Consequently, we are considered clarifying the specification to make it clear that <noinput> and <nomatch> events are never thrown from the recognition process during recording. Guillaume, please let us know whether you accept this disposition. If you do not explicit require the clarification concerning the throwing of <noinput> and <nomatch> events by recognition during recording, the group will use its discretion in whether the clarification needs to be applied. Thanks Scott
Received on Monday, 15 December 2003 06:33:08 UTC