RE: Question regarding DTMF buffering during fetchaudio from Michael Bodell on 2013-11-18 (www-voice@w3.org from October to December 2013)

From: Michael Bodell <bodell@247-inc.com>
Date: Mon, 18 Nov 2013 19:50:12 +0000
To: Ian Sutherland <ian.sutherland@oracle.com>, Debasmita Bal <Debasmita.Bal@genesyslab.com>
CC: "www-voice@w3.org" <www-voice@w3.org>
Message-ID: <67d7de3f4988414f93a4b903f418aedb@HKNPR03MB051.apcprd03.prod.outlook.com>
Fetchaudio processing can get very complex.  The second text block, if I remember correctly, is trying to express that there is no active listening going on (I.e., we are not listening to a "cancel" that would stop the fetch - a feature request for some).  DTMF should be buffered and that buffer should only be thrown out by doing a reco (using the buffer) or by playing a non bargeable prompt (which empties the buffered dtmf text).  Playing fetchaudio should not empty the buffer.  That said, the specification of DTMF buffering is at times vague, and given it is a SHOULD and not a MUST platforms likely would be able to not buffer at other times (or even not buffer at all) if they wished (I.e., have a limit to how far back in time you accept DTMF input [makes sense to me], decide if a speech ASR discards the dtmf buffer [I'd say it should], and/or decide if fetchaudio also clears the DTMF buffer [I'd say it shouldn't] - but on all of them your mileage might vary).

There are other peculiarities with fetchaudio too, like when does the fetchaudio stop playing?  The most natural sounding to a user [and only thing sensible IMO] is only when the next audio (be it normal listen state, next fetchaudio, prompts before the next fetchaudio, or terminal audio [audio as the call is ending] - essentially the 4 different ways you could next have audio that should be not just queued but played) is ready to be played.  The spec says to stop playing the fetchaudio once the resource is "retrieved", even if you'd then have "dead air" for several seconds while the parsing and/or processing and executing of the transition time continues.  And if you'd have a second slow fetch you could put a second fetchaudio, but then you might have a choppy fetchaudio experience (it would be cleaner to have the first slow fetch have an audio that plays until the next listen state is ready with the second slow fetch have no fetchaudio).  Of course the specification isn't precise in what constitutes "retrieving" a document, especially when you can further change things with caching of documents and potential prefetching of resources.

And to Ian's point the first text definitely includes loading documents and transitioning between dialogs (which are all done in the "transitioning" state).  That is the same delineation of state that governs post-processing which certainly can load documents and transition between dialogs (but which ends when the interpreter would enter a waiting for input state).  Also, with VXML 2.1 even inside a dialog transitioning between input states one might well have a fetchaudio on a data element (like in the executable content of an event handler).


From: Ian Sutherland [mailto:ian.sutherland@oracle.com]
Sent: Monday, November 18, 2013 10:04 AM
To: Debasmita Bal
Cc: www-voice@w3.org
Subject: Re: Question regarding DTMF buffering during fetchaudio


One person's opinion, but it looks like the first highlighted passage refers to transitions between input items, which (at least usually, in my experience) doesn't involve fetching a resource, and wouldn't have fetchaudio.
On 18-Nov-13 04:57, Debasmita Bal wrote:
We have a question regarding whether DTMF should be buffered during a transition with a fetchaudio.

There are two sections in the VXML specs, one which indicates that DTMF should be buffered while the other indicates that the DTMF should be rejected.


The following suggests that DTMF should be buffered

A VoiceXML interpreter is at all times in one of two states:
*         waiting for input in an input item (such as <field>, <record>, or <transfer>), or
*         transitioning between input items in response to an input (including spoken utterances, dtmf key presses, and input-related events such as a noinput or nomatch event) received while in the waiting state. While in the transitioning state no speech input is collected, accepted or interpreted. Consequently root and document level speech grammars (such as defined in <link>s) may not be active at all times. However, DTMF input (including timing information) should be collected and buffered in the transition state. Similarly, asynchronously generated events not related directly to execution of the transition should also be buffered until the waiting state (e.g. connection.disconnect.hangup)
The following suggests that DTMF should NOT be buffered

*         when the interpreter begins fetching a resource (such as a document) for which fetchaudio was specified. In this case the prompts queued before the fetchaudio are played to completion, and then, if the resource actually needs to be fetched (i.e. it is not unexpired in the cache), the fetchaudio is played until the fetch completes. The interpreter remains in the transitioning state and no input is accepted during the fetch.
Can you please clarify?

Thanks,
Debasmita
Received on Monday, 18 November 2013 19:50:51 UTC