Re: [v3] Some v3 functionality suggestions and scenarios from Shane Smith on 2006-08-02 (www-voice@w3.org from July to September 2006)

From: Shane Smith <safarishane@gmail.com>
Date: Wed, 2 Aug 2006 18:24:39 -0500
To: "Skip Cave" <Skip.Cave@intervoice.com>
Cc: www-voice@w3.org
Message-ID: <8fc15e140608021624i51368963o45dadf987c016475@mail.gmail.com>
Skip,

>>1- Grammars that stop the dialog thread, and return a semantic tag to
affect the dialog >>flow
done and done

>>2- Grammars that do NOT affect the dialog flow at all, but produce
asynchronous events >>to be handled by CCXML/scXML
Using marktime, this could be accomplished by setting marktime upon an
utterance, performing actions on the client side, and then jumping back into
your prompt using your marktime as a reference.  With bargeintype set to
hotword, I imagine this would be seamless to the caller.

>>3- Grammars that don't return semantic tags, but instead affect local
parameters such >>as playback speed, loudness, audio file position, etc.
Same, using marktime, though my guess would be a round trip to the server.
I can really see using marktime becoming ugly if we were to request audio
volume changes and needed to handle that on the server for the upcoming http
fetch of the audio file.  Possible, but ugly.

If these changes are implemented in 3, from an IVR perspective I would still
want to potentially provide an audio cue that the grammar was accepted and
action taken.  Conversely, we would also potentially need an earcon to let
the caller know they nomatched on their last spoken utterance.  Both of
these audio cues would need to be played on top of the current audio stream
playback, assuming these work similar to the bargeintype=hotword support
today.  Does v3 support combining audio streams?  Would we be able to do
this without stopping the stream playback as you suggest?  Otherwise, I'd
end up using marktime to implement client side browser functionality on the
server to work around those limitations v3 is supposed to address.

> As far as I can tell, there is no way for CCXML to gracefully stop a
running
> VXML script without killing the browser, let alone suspend it, with the
> resume state context saved automatically. And of course, there is no
current
> way for CCXML to tell a VXML browser to resume a certain state after it
has
> been suspended.
I see your point.  It could be argued that this functionality belongs in the
application scope, simply causing the next fetch to spit out vxml that would
make it seem as if we picked up right where we left off.  That leaves out
client side events though, with ccxml trying to tell vxml it's time to
pause.

Cool, good info, thanks...
-Shane Smith


On 8/2/06, Skip Cave <Skip.Cave@intervoice.com> wrote:
>
>
>
>
> Shane,
>
>
>
> My comments are interspersed with yours.
>
>
>
>  ________________________________
>
>
> From: Shane Smith
>
> Sent: Wednesday, August 02, 2006 3:22 PM
>  To: Skip Cave
>  Subject: Re: [v3] Some v3 functionality suggestions and scenarios
>
>
>
>
> Hello Skip,
>
>  Interesting read... had a couple of clarifications if you don't
mind.  Are
> there any scenarios you envision that couldn't be handled with CCXML?
>
>
> [SC] As far as I can tell, NONE of my scenarios could be implemented in
> CCXML, though it is more a problem with VXML than CCXML. Take the scenario
> where the VXML script is playing a long voicemail message & an external
> asynchronous event occurs (presumably detected by CCXML or scXML). The
> external event could be a task completion, an inbound call, a stock-sell
> threshold reached, whatever) There currently isn't any way for CCXML to
> suspend the current active VXML script, save the VXML script context, and
> pause the voicemail play, to make way for the user to handle the new
event.
> We need a way for CCXML to suspend and resume VXML scripts without losing
> context.
>
> Assuming the active VXML script could be suspended, then the application
> needs to let the user deal with the issue - acknowledge the task
completion,
> handle the call, interact with a different VXML script to deal with the
> stock sale, etc. This means that the CCXML/scXML process may need to start
> up a second VXML script to let the user deal with the asynchronous
> concurrent task, leaving the original script suspended on the context
stack.
> After dealing with the issue, we want to have CCXML tell the VXML browser
to
> resume back where it left off, popping the context stack, and continuing
> where it left off originally, playing the long voicemail message in the
> voicemail VXML script.
>
> As far as I can tell, there is no way for CCXML to gracefully stop a
running
> VXML script without killing the browser, let alone suspend it, with the
> resume state context saved automatically. And of course, there is no
current
> way for CCXML to tell a VXML browser to resume a certain state after it
has
> been suspended.
>
> Another limitation with current VXML, is the capability to allow a user to
> spawn events or commands during a play or recognize dialog state, without
> killing the ongoing dialog. For example - as before, a user is listening
to
> his long voicemail message. In the middle of the message from Joe, the
user
> decides he wants to call Joe (or send Joe an email, etc.). The user says
> "Call Joe" or Email Joe to call me", or some other command, and continues
> listening to Joe's message. The system should take the command "Call Joe",
> spawn a concurrent process to call Joe or send him an email, but keep on
> playing Joe's voicemail message without stopping. This scheme is currently
> impossible in VXML today. Again it's not CCXML's problem, its VXML's
> problem.
>
> A similar issue is when the user is listening to the long voicemail from
> Joe, and he says commands like "back up 10 seconds" or, "skip to the last
20
> seconds", or 'louder' or "play faster", or "slow down". All of these
> commands should affect the playback of the voicemail message, but not stop
> the playback. Currently, VXML doesn't do this. As a general rule there
needs
> to be three different types of grammars in VXML
>
> 1- Grammars that stop the dialog thread, and return a semantic tag to
affect
> the dialog flow
>
> 2- Grammars that do NOT affect the dialog flow at all, but produce
> asynchronous events to be handled by CCXML/scXML
>
> 3- Grammars that don't return semantic tags, but instead affect local
> parameters such as playback speed, loudness, audio file position, etc.
>
> Now all of this is really a limitation on VXML, not CCXML, which is why my
> message title was prefaced [v3] and not [CCXML].
>
> Looking at the VXML 3.0 spec at
> http://www.w3.org/Voice/Group/2005/V3/, it is clear that it
> is planning to have more asynchronous capabilities than VXML 2.1
>
>
>
> Under section 1.2.2.3 of the VXML 3 spec it says:
>
> More advanced interaction with the presentation is possible in the (VXML
3)
> DFP framework than is currently permitted with VoiceXML 2.0/2.1.
> Consequently, VoiceXML 3.0 may be enhanced with capabilities such as:
>
> VoiceXML dialogs are cancellable
> VoiceXML dialogs can receive events from the flow layer during execution.
> These events are exposed in the presentation markup.
> VoiceXML dialogs can send events to the flow layer during execution. These
> events are specified in the presentation markup.
>
> This is a good start, but the suspend scenario I described is not covered
in
> the statement of new capabilities. One thing missing from this is the
> capability to save the dialog state, and return back later. There needs to
> be a "suspend/resume" command besides the standard "start dialog" command
> from CCXML.  Hopefully this functionality will get added as the spec
> matures.
>
> Second, 3.0 needs the capability to accept user commands (touch tone,
voice,
> pen, whatever) during a play or recognize state, without stopping the play
> or recognize state. These asynchronous commands should be able to send
> events to CCXML/scXML without affecting the dialog thread. Or, the command
> could affect how the current media is being handled, or other local
effects
> such as "record the remainder of this call" or "mute Joe on this
conference"
>
>
> As an ivr designer, I've used vxml primarily to drive the call, using it
as
> simply as possible, just like any other protocol.  I never really 'write'
> vxml apps, I write web apps that shoot out vxml instead of html.  First
> cardinal sin on any application under my direction is the introduction of
> client side logic.  Though I've been working this way for years, I've seen
a
> tendency at several client sites to try and write a client side
application,
> instead of handling all logic on the server side.  Time to implement,
debug,
> maintain, and test are all shorter when using existing web application
test
> suites.  (currently project uses canoo, ugly but works)  Every bit of
logic
> can be functionally tested separate from the vui (kinda like mvc) and only
> when everything works do we pick up the phone for a real test call.  I
would
> hate to see the vxml spec evolve to where it required more logic on the
> client vxml browser than is necessary.  All the logic gates available
today
> in vxml are generally shunned.
>
>
> [SC] I agree totally with you. Server side is the way to go. However with
> VXML, today's CCXML server currently doesn't have enough control over the
> script execution. VXML 3.0 should try and fix these issues. It's not a
> problem with CCXML.
>
>
>  If you're familiar with osd/osdm or apache rdc's, what we do is similar
but
> with all event handling done by java, and nothing but a simple javascript
> function to encode all data to be passed back to the server in a single
> variable.  Is vxml3 still going to be accommodating to develop in this
> fashion?
>
>
>
>
> [SC] Something like what you suggest is feasible, but keep in mind that
> asynchronous events will be happening on both the server side and on the
> client side, at any time. Both entities (server & client) must need to be
> able to handle these events. Whatever mechanism is finally used, must
> efficiently deal with this fact.
>
> Regards,
>  Shane Smith
>
>  This e-mail transmission may contain information that is proprietary,
> privileged and/or confidential and is intended exclusively for the
person(s)
> to whom it is addressed. Any use, copying, retention or disclosure by any
> person other than the intended recipient or the intended recipient's
> designees is strictly prohibited. If you are the intended recipient, you
> must treat the information in confidence and in accordance with all laws
> related to the privacy and confidentiality of such information. If you are
> not the intended recipient or their designee, please notify the sender
> immediately by return e-mail and delete all copies of this email,
including
> all attachments.
>
>
Received on Wednesday, 2 August 2006 23:24:51 UTC