RE: VoiceXML 2.0: Official Response #9 to Candidate Recommendation Issues

Ufuk,

Thanks for your response - and apologies again for the delay (the Voice
Browser group has been working on a number of other specifications which
VoiceXML is dependent upon to finalize as a recommendation). 

I've noted your point about differences between record and non-record
input items, and I will put them before the group to determine if we
should add some clarification to the specification in this regard.

I've also copied this email to the public list for administrative report
tracking.

Thanks again for your timely reponse.

Scott
 

-----Original Message-----
From: Ufuk Kayserilioglu [mailto:kayseri@phonoclick.com] 
Sent: 19 November 2003 21:52
To: McGlashan, Scott
Subject: Re: VoiceXML 2.0: Official Response #9 to Candidate
Recommendation Issues


Thanks for your response, though it comes a little bit delayed. :)

I am mostly satisfied with the below resolution and they successfully 
answer all my questions/objections.

However, I still feel obliged to, at least note, the fact that when 
speaking about "recording", the concept of a bargein is thought of as 
bargeing-in the prompt before the record and starting the recording 
thus. I realize by your explanations below that this is not the case. I 
realize what you mean by "bargein" in the context of a recoding is not 
the ability for the user to start speaking during the prompts to fill 
the recording.

This still leaves the operation of a voice dialog with a strange 
behaviour: If you have 3 recognition fields with bargein-able prompts 
and then a recording, the user is surprised (and maybe annoyed) by the 
fact that the first three voice input fields can be barged and the last 
voice input field (which is what it is for all he cares) cannot.

I hope you note these concerns. Apart from this, I should repeat that I 
am satisfied with the response.

Thank you,

Ufuk Kayserilioglu

McGlashan, Scott wrote:

>The Voice Browser Working Group (VBWG) is now completing its resolution

>of issues raised during the review of the Candidate Recommendation 
>version of VoiceXML 2.0 [1]. Our apologies that it has taken so long to

>respond.
>
>Following the process described in [2] for advancement to Proposed 
>Recommendation, this is the VBWG's formal response to the issues you 
>raised.
>
>Please indicate before 26 November 2003 whether you are satisfied with 
>the VBWG's resolutions, whether you think there has been a 
>misunderstanding, or whether you wish to register an objection.
>
>If you do not think you can respond before 26 November, please let me 
>know. The Director will appreciate a response whether you agree with 
>the resolutions or not. However, if we do not hear from you at all by 
>26 November 2003, we will assume that you accept our resolutions.
>
>Below you will find a summary of the VBWG's responses to each of your 
>issues. Please use the issue identifiers when responding.
>
>Thank you,
>
>Scott McGlashan
>Co-chair, Voice Browser Working Group
>
>[1] http://www.w3.org/TR/2003/CR-voicexml20-20030220/
>[2] http://www.w3.org/2003/06/Process-20030618/ 
>
>
>-----------------------------------------------
>Issues you raised and VBWG responses
>-----------------------------------------------
>
>Issues: CR15-1 
>http://lists.w3.org/Archives/Public/www-voice/2003AprJun/0030.html
>
>Issue CR15-1
>We are trying to implement the <record> tag in our Voice Browser in a 
>comformant way; however, we cannot understand what, clearly, are the 
>requirements from a browser for this tag. My points can be summed up as
>follows:
>
>I) The main confusion arises form the behaviour of bargein="true" 
>prompts in <record>. According to Fig 7 in section 2.3.6 (lower left
>corner) bargein controls apply to audio queued within <record>. On the 
>other hand, a few lines below, it is stated:
>
>"A /recording begins/ at the earliest after the playback of any prompts

>(including the 'beep' tone if defined). As an optimization, a platform 
>may begin recording when the user starts speaking."
>
>Now, if recording does not begin DURING the prompt playback, then how 
>can those prompts be barged-in? Or, should we understand that if the 
>user barges-in with voice during prompt playback THEN recording should 
>be started? A clarification of how <record> and audio queued within 
><record> with barge-in interacts, in our opinion, is badly needed.
>
>II) The second comment that baffles us in the spec is:
>
>"If no audio is collected during execution of <record>, then the record

>variable remains unfilled (note 
><http://www.w3.org/TR/voicexml20/#unfilled_record>). This can occur, 
>for example, when DTMF or speech input is received during prompt 
>playback or the timeout interval (if the developer wants input during 
>prompt playback to initiate recording, then prompts should be placed in

>an immediately preceding <field> with a zero timeout)." (Section 2.3.6)
>
>This comment is weird in two ways:
>
>  1) How can record variable be unfilled "when DTMF or speech input is 
>received during ... the timeout interval"? This seems to be the primary

>method of filling a record variable.
>
>  2) We cannot grasp, in any way, how it would be possible to achieve 
>what the spec author has stated within the parantheses. If there is 
>preceeding <field> with zero timeout then:
>    i) if the user starts speaking while the prompts in the <field> are

>playing then the input goes to the processing of the field and will be 
>matched to whatever grammar is specified for it, or will throw a 
>"nomatch",
>    ii) else if the user waits for the prompts to finish, then a 
>"noinput" event will be thrown.
>  In neither case, will the input be going into the <record> tag that 
>succeeds the <field> tag. If the spec is trying to say something else 
>then it should be clearly explained.
>
>
>CR15-1 Resolution: rejected with modifications
>
>I). Prompts can be barged in on if active DTMF grammars are defined 
>(active speech grammars too but the ability to combine recognition and 
>recording may be removed from the specification due to a lack of 
>implementation support). II.1) DTMF input with recording triggered by 
>voice activity detection (i.e. as platform optimization, instead of 
>recording starting immediately after prompt playback, recording only 
>begins when voice activity is detected). II.2) We agree this is 
>confusing (it was intended to cover another use case). So we will 
>remove the text in parenthesis "(if the developer wants input during 
>prompt playback to initiate recording, then prompts should be placed in

>an immediately preceding <field> with a zero timeout) "
>  
>

Received on Thursday, 20 November 2003 12:18:17 UTC