Enhanced Model for CCXML State Management from Moshe Yudkowsky on 2006-06-22 (www-voice@w3.org from April to June 2006)

From: Moshe Yudkowsky <speech@pobox.com>
Date: Thu, 22 Jun 2006 13:20:49 -0500
To: www-voice@w3.org
Message-Id: <6.1.2.0.2.20060622130714.02ad3d20@dns.bl.com>
Enhanced Model for CCXML State Management

My name is Moshe Yudkowsky; among other speech-related work, I develop 
VoiceXML and CCXML applications. I have a proposal to make that will 
improve how CCXML manages state variables. In particular, I will propose a 
radical -- yet incremental -- change to the conceptual model of CCXML state 
management.

Here's a very simple use case: a user calls into an application. The 
application then does two things simultaneously:

(a) the application answers the phone to say "Welcome to the Acme Call 
System."
(b) the application uses ANI/caller ID to check the identity of the caller 
and see if the caller is authorized for use of the system.

This use case isn't far-fetched or unusual; many applications require 
authorization, customization, etc. based on Caller ID. Similar use cases of 
multitasking in CCXML are easy to find.  (If you'd like an example of this 
particular use case, see the open-source "Voice Conference Manager" at 
http://vcm.sourceforge.net.)

The interesting part of this use case is that the application must manage 
two different activities. The first activity is to answer the phone and 
speak the dialog to the user. The second activity is to send an HTTP 
request to check for authorization. Once these two separate activities 
complete, the rest of the application logic will proceed.

The easiest and most foolproof way to handle each of these two activities 
is through sub processes. I prefer to send the ANI/Caller ID to a separate 
CCXML process which interacts with the database; this CCXML script sends 
HTTP requests, waits for the result, handles error conditions, timeouts, 
re-requests, etc. and sends the ultimate result back to the main CCXML 
script. At the same time, the announcement to the user is handled via a 
VoiceXML script, and when that script completes the result (event) is 
returned to the main CCXML script.

Here's the problem: CCXML does not provide any facility to manage these two 
simultaneous subprocesses. CCXML is an event-driven state machine, and it 
only tracks one state at a time. However, the application has two states to 
track: the state of the database lookup (did it return? what result did it 
return?) and the state of the VoiceXML dialog (did it return?). And, of 
course the original state, namely the state of the overall main script.

I solve this problem by using a state ("WAIT_FOR_ANSWERS") that has its own 
private state variables ("DB_LOOKUP_STATE," "ANNOUNCE_STATE"). Instead of 
using CCXML's facilities to manage my state variable, I have to build and 
manage my own state information within the context of CCXML: When the 
VoiceXML dialog finishes, I have to check to see if the authorization 
lookup has finished; when the authorization lookup finishes, I have to 
check if the VoiceXML dialog has finished. Please note that if there's a 
significant gap between the end of dialog and the end of authorization, I 
might even have process a timeout and insert a "one moment please" dialog 
for the user, a further complication.

In my experience, private state management is both error prone and 
difficult. Most of all, it's annoying -- after all, CCXML provides 
perfectly good state management!

In my opinion, CCXML-based applications are not really state machines: They 
are actually Petri nets. Unlike the assumptions of the current CCXML model, 
it's not always a single event that drives the state transition; several 
events may need to occur to transition to the next state (such as this use 
case, where the application has to wait for the end of dialog and the 
result of authorization).

Here's a possible incremental change to the CCXML state management system. 
CCXML would continue to manage just one "main" state variable but would 
allow more complex event conditions, as seen in Petri nets. One way CCXML 
can incorporate Petri-net capabilities is by adding logic operators to the 
transition element's "event" attribute. For example, "a + b" would mean 
"pend until both event A and event B have been received." We'd also need a 
way to express the idea of "pend until A and B both arrive, but as long as 
no other  event is received for this specific state." And certainly there 
are other formulations ("A or B", "A and B but not C").

An alternative non-Petri net solution would be to designate multiple 
variables as state variables -- more than one state variable could exist in 
any given script. Transitions would be extended to include combinations of 
state variables ("state variable X is A and state variable Y is B while 
state variable Z is not C"). I don't know if that's workable or desirable, 
or what mathematical model would represent such a system.

I don't pretend to be an expert on Petri nets, but I've found the basic 
Petri net concept useful before in speech work. I am not insisting on a 
full-blown implementation of a theoretical Petri net; I just want the parts 
that are relevant for real-world CCXML applications. And I should probably 
point out thatcolored Petri nets may provide the best overall solution.

Finally, a few words on SCXML. I haven't used SCXML in an application, of 
course, but as far as I can tell it does offer advantages over CCXML, in 
particular for managing some forms of parallel processes. I don't know if 
SCXML is intended to handle the "A + B"  logical combination of events the 
way Petri nets would.

Some issues left over from the state-machine model of CCXML that haven't 
been addressed by SCXML. For example, the SCXML "cancel" element (6.2.2), 
due to the asynchronous nature of events, can easily generate 
"error.notallowed," but because events and state changes are asynchronous 
there's no simple way to know in advance when -- in what state -- the 
script will receive this error.notallowed. As a result the script will 
receive error.notallowed (probably in some catch-all transition) and cannot 
determine whether to discard this particular "error.notallowed" as 
irrelevant or instead to panic and exit. If the cancel event could be 
"colored" (or perhaps "vectored" or "tagged" is the right description) and 
sent only to the necessary states or automatically discarded by some states 
-- and all that handled by the interpreter instead of custom-coded into the 
script -- scripts would be far easier to write. With easier error handling, 
scripts would become aware of errors, and scripts would therefore probably 
become more reliable as well. In other words, even with SCXML, the notion 
of colors from Petri nets may provide a conceptual framework for a solution.

-- 
  Moshe Yudkowsky
  Disaggregate
  2952 W Fargo
  Chicago, IL 60645 USA

  Work: www.Disaggregate.com
  Book: www.PebbleAndAvalanche.com

  speech@pobox.com
  +1 773 764 8727
Received on Friday, 23 June 2006 04:49:20 UTC