Overview of the Voice Browser Working Group's DFP Framework

  DFP (Data-Flow-Presentation) Framework

Date: 20050710


    1. DFP Overview

The DFP (Data Flow Presentation) framework is an emerging architecture 
developed by the Voice Browser Activity <http://www.w3.org/Voice/>. The 
framework explains how Voice Browser specifications can be used together 
to create modular voice applications.

The framework is composed of three layers:

Data
    A component in the data layer manages data for the application. Data
    is stored in a canonical format. The data representation language
    has not yet been decided, but most likely a hierarchical XML
    representation will be used to hold different kinds of data - for
    example, ranging from environment data to domain-specific data used
    for canonicalization to EMMA for holding interpretations as well as
    interaction history to track user inputs. 
Flow
    A component in the flow layer controls the application flow. It does
    so by interacting with data and presentation layers.
    A flow component does not directly interact with the user. Rather,
    it requests user interaction by invoking a component on the
    presentation layer; the invocation may include data derived from the
    data component. When information is returned from a presentation
    component, the flow component can then update the data
    representation in a data layer component. The flow component is
    responsible for marshalling this data into the canonical format used
    by the data component.
    Application flows may be structured in terms of a state machines or
    other appropriate techniques such as rules, scripts, etc. Voice
    Browser languages to describe application flow include CCXML
    <http://www.w3.org/TR/ccxml/> and SCXML <http://www.w3.org/TR/scxml/>. 
Presentation
    Components on the presentation layer interact with the user; for
    example, by playing media files and synthesized speech, and by
    accepting speech and DTMF input from the user.
    A flow component invokes a presentation component with data from the
    data layer. A presentation may have a local data representation
    which persists for the duration of the active presentation. During
    the presentation, the presentation, flow and data components may
    exchange further information. The flow component may also cancel an
    active presentation. Once a presentation is complete, the
    presentation component indicates this to the flow component; this
    indication may include data collected, or derived from, user input
    and interaction.
    Voice browser languages for user presentation include VoiceXML 2.0
    <http://www.w3.org/TR/voicexml20/>, VoiceXML 2.1
    <http://www.w3.org/TR/voicexml21/> and VoiceXML 3.0 (in preparation
    - see under VoiceXML on Work Under Development in the Voice Browser
    Activity <http://www.w3.org/Voice/> ). 


    2. Relationship to other Approaches

The DFP framework is an instance of Model-View-Controller (MVC) design 
pattern: the data layer instantiates MVC's model, the flow layer 
instantiates the controller, and the presentation layer instantiates the 
view.

The DFP framework is also intended as a voice-centric instance of the 
Multimodal architecture <http://www.w3.org/TR/mmi-arch/> developed by 
the Multimodal Interaction Activity <http://www.w3.org/2002/mmi/> (MMI). 
The data layers are identical, the MMI's runtime framework corresponds 
to the flow layer, and MMI's modality components to DFP's presentation 
components. Ongoing collaboration between the activities will further 
refine and clarify the alignment between these approaches.


    3. Interface between Flow and Presentation Layers

The interface between flow and presentations components is defined in 
terms of invocation requests and their responses, as well as 
asynchronous notifications. In each case, these can be modelled as 
events, where the event has an event name and a data payload. The 
payload is modelled in terms of property name value pairs, where the 
name is a string, and the value can be an atomic type (e.g. string, 
integer or boolean) or a complex type (e.g. a nested properties 
structure). The precise format of the data payload is not yet decided.

A flow component can invoked a presentation by sending a 'start' event. 
The event needs to include sufficient information to start the 
presentation; for example, it may include a URI referencing a VoiceXML 
script and may also include information which is passed to this script 
upon initiation.

Once the presentation is started, a flow component may cancel the 
presentation by issuing a 'stop' event. Otherwise, the presentation runs 
until completion and a 'stopped' event is returned from the presentation 
to the flow component. The stopped event may include data collected 
during the presentation with the user. Prior to the presentation being 
stopped, the flow and presentation components may send each other 
'update' events.

An example of this interface is where a CCXML component is a flow 
component and VoiceXML 2.1 is a presentation component. At some stage in 
the application flow, the CCXML script starts a VoiceXML presentation by 
executing a <dialogstart> element with a src attribute indicating the 
script to run. Once the presentation has completed, a dialog.exit event 
is returned to the CCXML component.

More advanced interaction with the presentation is possible in the DFP 
framework than is currently permitted with VoiceXML 2.0/2.1. 
Consequently, VoiceXML 3.0 may be enhanced with capabilities such as:

    * VoiceXML dialogs are cancelleable
    * VoiceXML dialogs can received events from the flow layer during
      execution. These events are exposed in the presentation markup.
    * VoiceXML dialogs can send events to the flow layer during
      execution. These events are speicified in the presentation markup.


    4. Benefits

With the DFP framework, developers are able to structure their 
application in a modular manner, where data, flow and presentations are 
expressed in components at the appropriate layer.

An application's flow can be expressed in terms of states in a flow 
component: for a given state, a presentation component is invoked and 
the results returned from presentation component triggers state 
transitions in the flow component. This enables a clear separation of 
flow from presentation within the application, and faciliates 
development of reusable presentation components (such as parameterized 
VoiceXML <form>s for credit-card collection, scrollable lists, etc) 
which can be invoked from a variety of application flow components.

Application developers can also take advantage of flow components which 
support parallel invocation of presentations. For example, a SCXML flow 
component may start three presentation components executing at the same 
time; one presentation component presents background music, another 
continuously listens for an attention word, and the third component 
presents the application whose name is spoken by the user after speaking 
the attention word.

Finally, the framework promotes, but does not mandate, various 
application practises. The strong implication is that markup on each 
layer should only express what is appropriate at that layer. For 
example, presentation layer components should not express 'flow' 
concepts such as 'goto'. So instead of writing a single large VoiceXML 
presentation which uses <goto> to navigate between application states 
expressed as <form>s, the application could be written as a flow 
component and a set of 'micro-dialog' presentation components. For 
example, a CCXML/SCXML flow component which has a set of states 
corresponding to application states, together with a set of (reusable) 
VoiceXML presentations composed of a single VoiceXML <form> to interact 
with the user and return results to the flow component's states. This 
modular approach faciliates application development, maintainance, 
debugging and reuse.

Received on Thursday, 21 July 2005 16:14:23 UTC