- From: Yang, Xu <Xu.Yang@Aspect.com>
- Date: Wed, 18 Mar 2009 11:54:57 -0400
- To: "www-voice@w3.org" <www-voice@w3.org>
- Message-ID: <F2DDFEEC5C982D469FD25338393482FB906F140304@ASP1CMS2.aspect.com>
Teemu, I got some comments, hope it clarify things. please see below. Thanks, Xu I did re-found this old thread [Evidently someone is confused as to what "barge in" means.] about bargein and bargintype = "hotword", with no proper answer from Committee. Is there such existing nowadays when 3.0 is in progress and even 2.1 is out . What Dean in his mail express is actually exactly the same way I see barge-in. And this was the way how we implemented. My interpretation of bargein is : bargein happens, when user actively gives input during barge-able media play and by doing this causes media play to terminate. My original thought was that "hotword" bargein differs from "speech" bargein by the definition, what is treated as input that causes media play to terminate, and has nothing to do with the outcome of collection. [Xu] Agree, to be more specific, "speech" treats voice as input, and "hotword" treats "recognized utterance/dtmf" as input. "hotword" has to terminate the prompt based on the collection though, so I won't say it has nothing to do with the outcome of collection. With current definition as in VoiceXML 2.0 recommendations chapter 4.1.5.1 - "bargeintype" hotword declares more collection handling than actually barge-in behavior. I guess that this should be cleared out somehow. [Xu] If there are ambiguities from the spec, I think we should correct it. My understanding for the "barge-in behavior" is nothing more than "when to terminate the prompt", for "speech", is when voice/dtmf detected. For "hotword", is when a recognized utterance/dtmf is collected. I believe we will all agree this part based on the spec. By reading the chapter and defined consequences of "hotword", easily leads us to think, that the outcome of user entering incorrect input during timeout period, is two ended. ( and by my definition no barge-in even occurred ) If bargeintype "hotword" was used nothing happens and noinput is thrown and if bargeintype "speech" was used nomatch is thrown. [Xu] This is correct interpretation. In "hotword" case, assuming no further input after the prompt play is done, and no further input prior timeout expired. In the "speech" case, the prompt will be terminated once utterance/dtmf is detected. Still it is defined for input, that starting input during timeout period causes timeout to cancelled and interdigit or termtimeout to be used. Only exception in here is exact match with no termchar defined that leads to immediate collection end. [Xu] Once the prompt is done, and timeout period started, it is out of the scope of the bargein, doesn't matter what bargein type it is. So we can say it is not the scope of the hotword/speech bargein type related. It would make sense to me if bargeintype "hotword " would only affect to those collections that do _start_ during prompt play (bargein) and _end_ while prompt is still playing, or timeout period has not yet elapsed. In case of non bargeable prompt(s) bargeintype property would make no difference since no prompt barge in may occur and input is stared earliest at the begin of timeout period. [Xu] I would say it would not cover the "timeout period has not yet elapsed". Cause it is for bargein, the timeout period is out of the scope of the prompt. Yes, for non bargeable case, I the bargeintype would most likely be ignored by the platform. I guess that this is the original idea with hotword, since in VoiceXML it is quite easy to restart collection in case of <nomatch> but bargeintype "hotword" is currently our only tool to prevent incorrect input from interrupting prompt play. [Xu] I did see our customer wrote application this way, however, in my opinion, the hotword must have been originally designed for platform listen for small number of command words. But since this functionality was so close to "keep play prompt if no matched utterance/dtmf being input" (selective-barge-in in Nuance's term), it is a valid use case. DTMF input that does not match any grammar will cause system to collect more digits until interdigit timeout is elapsed and eventually throw nomatch. If bargeintype "hotword" is used, should the initial DTMF that caused the system to go into this, be discarded. Or should only the complete collection be discarded ? This is not defined but to make some analogue with voice input collection, discarding the complete collection sound better to me. [Xu] This is the spot that the 2.0 spec does not clearly specify, which may cause confusion: for hotword, the nomatched input during prompt play should be discarded or not. However, if we think the hotword bargein is only valid during the prompt play, we will naturally deduct that: any input during the prompt play is gone with the prompt. We will start a new recog after play done. (As you said, this is analogue for dtmf and voice. I made the same interpretation). I won't object to add one line to define this in the new spec 3.0. For example here are few examples from "hotword" barge-in case where user may enter any number of DTMF "1" Here is timing sequence of case when caller keeps entering DTMF-1 past the timeout period, and then presses DTMF-2. By the definition we were not on timeout period anymore and nomatch should be thrown !) NI = NOINPUT NM = NOMATCH --IDT-- = Interdigit timeout period | PROMPT PLAY | TIMEOUT | | Bargeintype="HW" | | -------------------------------------- \--IDT-\--IDT-\--IDT-\--IDT-\--IDT-\--IDT-\--IDT--\ DTMF-1 DTMF-1 DTMF-1 DTMF-1 DTMF-1 DTMF-1 DTMF-2 NOMATCH [Xu]Correct behavior according to the spec. assuming "dtmf-1 dtmf-2" does not match any activated grammars. Here is another sequence, User starts entering incorrect sequence during prompt play, since bargein input was started during prompt play and timeout was not elapsed when the first input was completed, collection resulted to noinput. | PROMPT PLAY | TIMEOUT | | Bargeintype="HW" | | -------------------------------------- \--IDT-\--IDT--\--IDT--\ \ DTMF-1 DTMF-1 DTMF-2 NM NI [Xu]Assume typo of "NM" in above line. This correctly reflected the spec. Here is yet another sequence, Since input was started during timeout period it should be treated as "non" bargein type and follow normal input collection and result to nomatch. | PROMPT PLAY | TIMEOUT | | Bargeintype="HW" | | -------------------------------------- \--IDT-\--IDT--\--IDT--\ DTMF-1 DTMF-1 DTMF-2 NM DTMF timing diagrams in VoiceXML specification wont contain any of these hotword cases nor they won't contain any of failing cases either. Defining those would clear up a lot. [Xu] I think from the existing 2.0 spec, developer can correctly deduct the above behavior. The Appendix D - Timing Properties of 2.0 spec is designated for more about different timing definition, instead of cover all the possible use cases. Which means, without the diagrams, the properties was not well defined in previous sections. Was this the idea that You had in Your minds? Or do you really mean that with "hotword" there is no such thing as nomatch. (which then limits VoiceXML developer quite much since it removes some vital information about user input. ) and to barge prompt does not actually mean giving input during prompt play. [Xu]Yes, your interpretations above correctly reflect the spec. The 2.0 spec emphasized "no such thing as no match for hotword" only means if the utterance/dtmf does not match the grammar during the valid duration of the hotword bargein, that is during the prompt play, platform should not throw nomatch. But it does not cover the time after prompt is played, and timeout expired. And the post-promptplay behavior should follow the rest of the spec, and which in your use cases, correctly demonstrated them. Just remember these when you define 3.0, currently the working draft is such skeleton of open ideas that giving any comment about it is quite hard indeed. BR - Teemu
Received on Thursday, 19 March 2009 01:32:44 UTC