- From: McGlashan, Scott <scott.mcglashan@hp.com>
- Date: Wed, 19 Nov 2003 20:33:27 +0100
- To: <kayseri@phonoclick.com>
- Cc: <www-voice@w3.org>
The Voice Browser Working Group (VBWG) is now completing its resolution of issues raised during the review of the Candidate Recommendation version of VoiceXML 2.0 [1]. Our apologies that it has taken so long to respond. Following the process described in [2] for advancement to Proposed Recommendation, this is the VBWG's formal response to the issues you raised. Please indicate before 26 November 2003 whether you are satisfied with the VBWG's resolutions, whether you think there has been a misunderstanding, or whether you wish to register an objection. If you do not think you can respond before 26 November, please let me know. The Director will appreciate a response whether you agree with the resolutions or not. However, if we do not hear from you at all by 26 November 2003, we will assume that you accept our resolutions. Below you will find a summary of the VBWG's responses to each of your issues. Please use the issue identifiers when responding. Thank you, Scott McGlashan Co-chair, Voice Browser Working Group [1] http://www.w3.org/TR/2003/CR-voicexml20-20030220/ [2] http://www.w3.org/2003/06/Process-20030618/ ----------------------------------------------- Issues you raised and VBWG responses ----------------------------------------------- Issues: CR15-1 http://lists.w3.org/Archives/Public/www-voice/2003AprJun/0030.html Issue CR15-1 We are trying to implement the <record> tag in our Voice Browser in a comformant way; however, we cannot understand what, clearly, are the requirements from a browser for this tag. My points can be summed up as follows: I) The main confusion arises form the behaviour of bargein="true" prompts in <record>. According to Fig 7 in section 2.3.6 (lower left corner) bargein controls apply to audio queued within <record>. On the other hand, a few lines below, it is stated: "A /recording begins/ at the earliest after the playback of any prompts (including the 'beep' tone if defined). As an optimization, a platform may begin recording when the user starts speaking." Now, if recording does not begin DURING the prompt playback, then how can those prompts be barged-in? Or, should we understand that if the user barges-in with voice during prompt playback THEN recording should be started? A clarification of how <record> and audio queued within <record> with barge-in interacts, in our opinion, is badly needed. II) The second comment that baffles us in the spec is: "If no audio is collected during execution of <record>, then the record variable remains unfilled (note <http://www.w3.org/TR/voicexml20/#unfilled_record>). This can occur, for example, when DTMF or speech input is received during prompt playback or the timeout interval (if the developer wants input during prompt playback to initiate recording, then prompts should be placed in an immediately preceding <field> with a zero timeout)." (Section 2.3.6) This comment is weird in two ways: 1) How can record variable be unfilled "when DTMF or speech input is received during ... the timeout interval"? This seems to be the primary method of filling a record variable. 2) We cannot grasp, in any way, how it would be possible to achieve what the spec author has stated within the parantheses. If there is preceeding <field> with zero timeout then: i) if the user starts speaking while the prompts in the <field> are playing then the input goes to the processing of the field and will be matched to whatever grammar is specified for it, or will throw a "nomatch", ii) else if the user waits for the prompts to finish, then a "noinput" event will be thrown. In neither case, will the input be going into the <record> tag that succeeds the <field> tag. If the spec is trying to say something else then it should be clearly explained. CR15-1 Resolution: rejected with modifications I). Prompts can be barged in on if active DTMF grammars are defined (active speech grammars too but the ability to combine recognition and recording may be removed from the specification due to a lack of implementation support). II.1) DTMF input with recording triggered by voice activity detection (i.e. as platform optimization, instead of recording starting immediately after prompt playback, recording only begins when voice activity is detected). II.2) We agree this is confusing (it was intended to cover another use case). So we will remove the text in parenthesis "(if the developer wants input during prompt playback to initiate recording, then prompts should be placed in an immediately preceding <field> with a zero timeout) "
Received on Wednesday, 19 November 2003 14:33:34 UTC