6. Environment and resources

Editor: Jim Ferrans

6.1 Resource fetching

6.1.1 Fetching

Fetching of content from a URI occurs in a VoiceXML interpreter context to: (1) fetch VoiceXML documents to interpret, or (2) fetch other document types, such as audio files, objects, grammars, and scripts. All occasions for fetching content in a VoiceXML interpreter context are governed by the following three attributes:

caching Either safe to force a query to fetch the most recent copy of the content, or fast to use the cached copy of the content if it has not expired. If not specified, a value derived from the innermost caching property is used.
fetchtimeout The interval to wait for the content to be returned before throwing an error.badfetch event. If not specified, a value derived from the innermost fetchtimeout property is used.
fetchhint Defines when the interpreter context should retrieve content from the server. prefetch indicates a file may be downloaded when the page is loaded, whereas safe indicates a file that should only be downloaded when actually needed. In the case of a very large file (implying long download times) or a streaming audio source, stream indicates to the interpreter context to begin processing the content as it arrives and should not wait for full retrieval of the content. If not specified, a value derived from the innermost relevant *fetchhint property is used.

When content is fetched from a URI, the caching attribute determines where it is located (in the cache or not), the fetchtimeout attribute determines how long to wait for the content (starting from the time when the resource is needed), and fetchhint determines when the content is fetched. The caching policies for a VoiceXML interpreter context are explained in more detail in the next section.

The fetchhint attribute is used to help interpreter contexts that can improve their performance by exploiting information about when content can be fetched. There is no requirement that an interpreter context must actually change when it fetches documents from other than a safe setting. However, any interpreter context that is capable of operating in a prefetch or stream setting, must also be able to operate under the safe setting.

The fetchhint attribute, in combination with the various fetchhint properties, is merely a hint to the interpreter context about when it may schedule the fetch of a document.  Telling the interpreter context that it may prefetch a document does not require that the document be prefetched; it only suggest that the document may be prefetched.  Likewise, telling the interpreter context that it may stream a document  does not force it to do so.  However, the interpreter context must always honor the safe fetchhint.

When the interpreter context does prefetch a document, it must ensure that the URI fetched is precisely the one needed.  In particular, if the URI is computed with an expr attribute, the interpreter context must not move the fetch up before any assignments to the expression's variables.  Likewise, the fetch for a <submit> must not be moved prior to any assignments of the namelist variables.  In practice, these two situations greatly constrain the ability to prefetch documents. 

When transitioning from one dialog to another, through either a <subdialog>, <goto>, <submit>, <link>, or <choice> element, there are additional rules that affect interpreter behavior. If the referenced URI names a document (e.g. "doc#dialog") or query data is provided (through POST or GET), then a new document is obtained (either from the local cache or from a server). When it is obtained, the document goes through its initialization phase (i.e., obtaining and initializing a new application root document if needed, initializing document variables, and executing document scripts). The requested dialog (or first dialog if none is specified) is then initialized and execution of the dialog begins. If the referenced URI names only a fragment (e.g. "#dialog") then no document is obtained, and no initialization of the document is performed. The requested dialog is processed as before.

Elements that fetch VoiceXML documents also support the following additional attribute:

fetchaudio The URI of the audio clip to play while the fetch is being done. If not specified, the fetchaudio property is used, and if that property is not set, no audio is played during the fetch.

The fetchaudio attribute is useful for enhancing a user experience when there may be noticeable delays while the next document is retrieved. This can be used to play background music, or a series of announcements. When the document is retrieved, the audio file is interrupted if it is still playing.

6.1.2 Caching

The VoiceXML interpreter context, just like HTML visual browsers, can use caching to improve performance in fetching documents and other resources; audio recordings (which can be quite large) are as common to VoiceXML documents as images are to HTML pages. In a visual browser it is common to include end user controls to update or refresh content that is perceived to be stale. This is not the case for the VoiceXML interpreter context, since it lacks equivalent end user controls. Thus enforcement of cache refresh is at the discretion of the applications program through appropriate use of the caching policies employed by VoiceXML interpreter contexts.

The default caching policy for VoiceXML interpreter contexts is one commonly employed in HTML browsers:

In VoiceXML this caching policy is known as fast. But because fast cache usage can lead to anomalous results, VoiceXML interpreter contexts also implement a safe caching policy:

The safe caching policy ensures that the VoiceXML interpreter context always has the most up to date version of a document, at the expense of performance (due to the extra access to the document server). The safe policy is similar to the effect of always reloading or refreshing a web page in an HTML visual browser.

VoiceXML allows the author to select which caching policy to use. The caching attribute of certain elements may be set to safe or fast to determine what default policy to use for that element. If the attribute is not specified, the policy is determined a <property> element that specifies a value for the caching property (see Section 17).

For example:

<?xml version="1.0"?> 
<vxml version="1.0"> 
   <!-- Elements in this document will by default use caching="fast".
--> 
   <property name="caching" value="fast"/> 
   … 
   <form id="test"> 
     <block> 
       <!-- Welcome rarely changes, so fast caching is fine. --> 
       <audio src="http://www.weather4U.example/vxml/welcome.wav"/> 
       <!-- Ads change all the time, so safe caching is needed. --> 
       <audio caching="safe" 
          src="http://www.onlineads.example/weather4U/ad17"/> 
     </block> 
      … 
   </form> 
   … 
</vxml> 

One common practice will be to use safe caching during development, when documents and resources change continually, and then use fast caching with selected resources fetched “safely” as the application goes into system test and then production.

It is also possible, though perhaps less likely, to have a production application that uses safe caching by default and fetches some resources using the fast caching policy.

6.2 Meta

The <meta> element specifies meta-data, as in HTML, which is data about the document rather than the document’s content. There are two types of <meta>. The first type specifies a meta-data property of the document as a whole. For example to specify the maintainer of a VoiceXML document:

<?xml version="1.0"?> 
<vxml version="1.0"> 
  <meta name="maintainer" content="jpdoe@anycompany.example"/> 
   … 
</vxml> 

The interpreter could use this information, for example, to compose and email an error report to the maintainer.

VoiceXML does not specify required meta-data properties, but the following are recommended:

author Information describing the author.
copyright A copyright notice.
description A description of the document for search engines.
keywords Keywords describing the document.
maintainer The document maintainer’s email address.
robots Directives to search engine web robots.

The second type of <meta> specifies HTTP response headers. In the following example, the first <meta> element sets an expiration date that prevents caching of the document; the second <meta> element sets the Date header.

<?xml version="1.0"?> 
<vxml version="1.0"> 
  <meta http-equiv="Expires" content="0"/> 
  <meta http-equiv="Date" content="Thu, 12 Dec 1999 23:27:21 GMT"/> 
   … 
</vxml> 

Attributes of <meta> are:

name The name of the meta-data property.
content The value of the meta-data property.
http-equiv The name of an HTTP response header. Either name or http-equiv must be specified, not both.

6.3 Property

The <property> element sets a property value. Properties are used to set values that affect platform behavior, such as the recognition process, timeouts, caching policy, etc.

Properties may be defined for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item. Properties apply to their parent element and all the descendants of the parent. A property at a lower level overrides a property at a higher level. Properties specified in the application root document provide default values for properties in every document in the application; properties specified in an individual document override property values specified in the application root document.

In some cases, <property> elements specify default values for element attributes, such as timeout or bargein. For example, to turn off bargein for all the prompts in a particular form:

<form id="no_bargein_form"> 
  <property name="bargein" value="false"/> 
  <block> 
    <prompt>
      This introductory prompt cannot be barged into.
    </prompt> 
    <prompt>
      And neither can this prompt.
    </prompt> 
    <prompt bargein="true">
      But this one <emp>can</emp> be barged into.
    </prompt> 
  </block> 
   … 
</form> 

Properties are also used to specify platform-specific data and settings. For example, to set a platform-specific property to prepend one second of silence before each recording made by a particular document:

<?xml
version="1.0"?> 
<vxml version="1.0"> 
  <property name="example.acme.endpointing.record_init_silence"
      value="1s"/> 
   … dialogs that make recordings go here … 
</vxml> 

The generic speech recognizer properties are taken from the Java Speech API (see http:// www.javasoft.com/products/java-media/speech/index.html):

confidencelevel The speech recognition confidence level, a float value in the range of 0.0 to 1.0. Results are rejected (a nomatch event is thrown) when the engine’s confidence in its interpretation is below this threshold. A value of 0.0 means minimum confidence is needed for a recognition, and a value of 1.0 requires maximum confidence. The default value is 0.5.
sensitivity Set the sensitivity level. A value of 1.0 means that it is highly sensitive to quiet input. A value of 0.0 means it is least sensitive to noise. The default value is 0.5.
speedvsaccuracy A hint specifying the desired balance between speed vs. accuracy. A value of 0.0 means fastest recognition. A value of 1.0 means best accuracy. The default is value 0.5.
completetimeout

The speech timeout value to use when an active grammar is matched. The default is platform-dependent. See Appendix F.

The length of silence required following user speech before the speech recognizer finalizes a result (either accepting it or throwing a nomatch event). The complete timeout is used when the speech is a complete match of an active grammar.  By contrast, the incomplete timeout is used when the speech is an incomplete match to an active grammar.

A long complete timeout value delays the result completion and therefore makes the computer's response slow. A short complete timeout may lead to an utterance being broken up inappropriately. Reasonable complete timeout values are typically in the range of 0.3 seconds to 1.0 seconds.  The default is platform-dependent. See Appendix F.

incompletetimeout

The speech timeout to use when no active grammar has been matched. The default is platform-dependent. See Appendix F.

The required length of silence following user speech after which a recognizer finalizes a result.  The incomplete timeout applies when the speech prior to the silence is an incomplete match of all active grammars.  In this case, once the timeout is triggered, the partial result is rejected (with a nomatch event).

The incomplete timeout also applies when the speech prior to the silence is a complete match of an active grammar, but where it is possible to speak further and still match the grammar.  By contrast, the complete timeout is used when the speech is a complete match to an active grammar and no further words can be spoken.

A long incomplete timeout value delays the result completion and therefore makes the computer's response slow. A short incomplete timeout may lead to an utterance being broken up inappropriately.

The incomplete timeout is usually longer than the complete timeout to allow users to pause mid-utterance (for example, to breathe). See Appendix F.

Several generic properties pertain to DTMF grammar recognition:

interdigittimeout The inter-digit timeout value to use when recognizing DTMF input. The default is platform-dependent. See Appendix F.
termtimeout The terminating timeout to use when recognizing DTMF input. The default value is "0s". Appendix F.
termchar The terminating DTMF character for DTMF input recognition. The default value is "#". See Appendix F.

These properties apply to the fundamental platform prompt and collect cycle:

bargein The bargein attribute to use for prompts. Setting this to true allows barge-in by default. Setting it to false disallows barge-in. The default value is "true".
timeout The time after which a noinput event is thrown by the platform. The default value is platform-dependent. See Appendix F.

These properties pertain to the fetching of new documents and resources:

caching Either safe to never trust the cache when fetching, or fast to always trust the cache. The default value is fast.
audiofetchhint This tells the platform whether or not it can attempt to optimize dialog interpretation by pre-fetching audio. The value is either safe to say that audio is only fetched when it is needed, never before; prefetch to permit, but not require the platform to pre-fetch the audio; or stream to allow it to stream the audio fetches. The default value is prefetch.
documentfetchhint Tells the platform whether or not documents may be pre-fetched. The value is either safe (the default), or prefetch.
grammarfetchhint Tells the platform whether or not grammars may be pre-fetched. The value is either prefetch (the default), or safe.
objectfetchhint Tells the platform whether the URI contents for <object> may be pre-fetched or not. The values are prefetch (the default), or safe.
scriptfetchhint Tells whether scripts may be pre-fetched or not. The values are prefetch (the default), or safe.
fetchaudio The URI of the audio to play while waiting for a document to be fetched. The default is not to play any audio. There are no fetchaudio properties for audio, grammars, objects, and scripts.
fetchtimeout The timeout for fetches. The default value is platform-dependent.

This property determines which input modality to use:

inputmodes The input modes to enable: dtmf and voice. On platforms that support both modes, inputmodes defaults to “dtmf voice”. To disable speech recognition, set inputmodes to “dtmf”. To disable DTMF, set it to “voice”. One use for this would be to turn off speech recognition in noisy environments. Another would be to conserve speech recognition resources by turning them off where the input is always expected to be DTMF.

This property determines which platform default universal command grammars to use:

universals

Production-grade applications often need to define their own universal command grammars, e.g., to increase application portability or to provide a distinctive interface.   They specify new universal command grammars with <link> elements.  They turn off the default grammars with this property.

The value "all" is the default, and means that all platform default universal command grammars are enabled.  The value "none" turns them all off.  Individual grammars are enabled by listing their names separated by spaces.  For instance "cancel exit help" is equivalent to "all".   

Our last example shows several of these properties used at multiple levels.

<?xml version="1.0"?> 
<vxml version="1.0"> 
  <!-- set default characteristics for page --> 
  <property name="caching" value="safe"/> 
  <property name="audiofetchhint" value="safe"/> 
  <property name="confidence" value="0.75"/> 

  <form> 
    <!-- override defaults for this form only --> 
    <property name="confidence" value="0.5"/> 
    <property name="bargein" value="false"/> 
    <grammar src="address_book.gram" type="application/x-jsgf"/> 
    <block> 
      <prompt> Welcome to the Voice Address Book </prompt> 
    </block> 
    <initial name="start"> 
      <!-- override default timeout value --> 
      <property name="timeout" value="5s"/> 
      <prompt> Who would you like to call? </prompt> 
    </initial> 
    <field name="person"> 
      <prompt>
        Say the name of the person you would like to call.
      </prompt> 
    </field> 
    <field name="location"> 
      <prompt>
        Say the location of the person you would like to call.
      </prompt> 
    </field> 
    <field name="confirm" type="boolean"> 
      <!-- Use actual utterances to playback recognized words, 
              rather than returned slot values --> 
      <prompt> 
        You said to call <value expr="person$.utterance"/> 
        at <value expr="location$.utterance"/>. 
        Is this correct? 
      </prompt> 
      <filled> 
        <if cond="confirm"> 
          <submit 
            next="http://www.messagecentral.example/voice/make_call" 
            namelist="person location" /> 
        </if> 
        <clear/> 
      </filled> 
    </field> 
  </form> 
</vxml> 

6.4 Param

The <param> element is used to specify values that are passed to subdialogs or objects. It is modeled on the HTML <PARAM> element. Its attributes are:

name The name to be associated with this parameter when the object or subdialog is invoked.
expr An expression that computes the value associated with name.
value Associates a literal string value with name.
valuetype One of data or ref, by default data; used to indicate to an object if the value associated with name is data or a URI (ref). This is not used for <subdialog>.
type The MIME type of the result provided by a URI if the valuetype is ref; only relevant for uses of <param> in <object>.

Exactly one of expr or value must be present. The use of valuetype and type is optional in general, although they may be required by specific objects. When <param> is contained in a <subdialog> element, the values specified by it are used to initialize dialog <var> elements in the subdialog that is invoked. When <param> is contained in an <object>, the use of the parameter data is specific to the object that is being invoked, and is outside the scope of the VoiceXML specification.

Below is an example of <param> used as part of an <object>. In this case, the first two <param> elements have expressions (implicitly of valuetype="data"), the third <param> has an explicit value, and the fourth is a URI that returns a MIME type of text/plain. The meaning of this data is specific to the object.

<object  name="debit" 
 classid="method://credit_card/gather_and_debit" 
 data="http://www.recordings.example/prompts/credit/jesse.jar"/> 
  <param name="amount" expr="document.amt"/> 
  <param name="vendor" expr="vendor_num"/> 
  <param name="application_id" value="ADC5678-QWOO"/> 
  <param name="authentication_server"
   value="http://auth_svr.example" 
   valuetype="ref"
   type="text/plain"/> 
</object> 

The next example illustrates <param> used with <subdialog>. In this case, two expressions are used to initialize variables in the scope of the subdialog form.

Form with calling dialog
<form> 
  <subdialog name="result" src="http://another.example/#getssn"> 
    <param name="firstname" expr="document.first"/> 
    <param name="lastname" expr="document.last"/> 
    <filled> 
      <submit namelist="result.ssn" 
        next="http://myservice.example/cgi-bin/process"/> 
    </filled> 
  </subdialog> 
</form> 

Subdialog in http://another.example
 

<form id="getssn"> 
  <var name="firstname"/> 
  <var name="lastname"/> 
  <field name="ssn"> 
    <grammar src="http://grammarlib/ssn.gram"
     type="application/x-jsgf"/> 
      <prompt>
        Please say social security number.
      </prompt> 
      <filled> 
        <if cond="validssn(firstname,lastname,ssn)"> 
          <assign name="status" expr="true"/> 
          <return namelist="status ssn"/> 
        <else/> 
          <assign name="status" expr="false"/> 
          <return namelist="status"/> 
        </if> 
      </filled> 
  </field> 
</form> 

Using <param> in a <subdialog> is a convenient way of passing data to a subdialog without requiring the use of server side scripting.

6.5 Time Designations

Time designations follow those used in W3C's Cascading Style Sheet recommendation (http://www.w3.org/TR/R EC-CSS2/syndata.html#q20). They consist of an unsigned integer number followed by an optional time unit identifier. The time unit identifiers are:

Examples include: "3s", "850ms", and "+1.5s".  Negative time designations are not permitted.