RE: VoiceXML 2.0 comments from Scott McGlashan on 2002-02-16 (www-voice@w3.org from January to March 2002)

From: Scott McGlashan <scott.mcglashan@pipebeach.com>
Date: Sun, 17 Feb 2002 00:34:11 +0100
To: "George Clelland" <george_clelland@uk.ibm.com>
Cc: <www-voice@w3.org>
Message-ID: <2764A29BE430E64A92EB56561587D2E701C14E@se01ms02.i.pipebeach.com>
Hi George,

sorry for the delay in getting you this offical responses.

Below is our reponse to your questions (indicated by >>>). 

If you are not satisfied with the reply, or want more information, let
me know.

Scott

(Chairman, W3C VB Dialog Committee)


-----Original Message-----
From: George Clelland [mailto:george_clelland@uk.ibm.com]
Sent: 23 November 2001 18:16
To: www-voice@w3.org
Subject: VoiceXML 2.0 comments


I have some personal comments on the VoiceXML 2.0 Working Draft.

     An additional form of transfer would be useful. A standard
supervised
     transfer would be useful, where the interpretor disconnects from
the
     telephone call after a successful connection has been established
with
     the third party. This would be beneficial for call centre
     applications.

>>> We have received a number of request for additonal forms of
transfer. In VoiceXML 2.0, our primary goal has been to clarify the
VoiceXML 1.0 language, rather than add new features in this version
(especially since there is a separate activity in the working group
looking at more advanced Call Control functionality). For this reason,
we defer this feature request for consideration in a later version of
the language. 



     The text associated with the code example give in section 3.1.5 has
an
     error. The text states the result will be "AMEX if the caller
enters
     DTMF 1; the text should say DTMF 3.

>>> This will be corrected in the next draft. 

     Many of speech markup elements are too complex for application
     programmers, and should only be used by specialists.  Given that
the
     philosphy of voiceXML is to make speech applications easier to
develop
     and minimise the specialist speech knowledge required, elements
such
     as <prosody> and <phoneme> seem to contradict this goal.

>>> Application developers do not need to use SSML. However, as they
become more experienced, and applications become more demanding, the
advanced features of SSML may become useful. 


     The first code example in section 4.1.3  has an error. The <say-as>
     element has an old parameter, namely class rather than type.

>>> This will be corrected in the next draft. 


     The discussion on prompt queuing and input collection in section
4.1.8
     clarifies the operation of interpretors. However, I believe the
     current operation is flawed in that it forces the use of fetchaudio
in
     order to have a prompt at the end of one document played before the
     next document is fetched.  A typical call flow will have a prompt
     telling the caller to wait while some data is retrieved, but with
the
     current operation, this prompt is not played to the caller until
after
     the second document is fetched, unless fetechaudio is used. This is
     not an  intuitive operation, as the addition of the optional
     fetchaudio changes the way the application works and leads to
     unexpected results. I would propose that all queued prompts are
played
     as soon as a document is to be fetched and then the fetchaudio if
     specified, this is much more obvious form of operation for
application
     developers.

>>> We have seem many applications where it is highly desirable to queue
prompts and then play them when the next document is loaded. To obtain
the behavior you want, there are a number of options. 1) Use a
fetchaudio to read out the message out, e.g. "Now fetching your data",
2) put the message in a queued prompt and add a 'dummy' fetchaudio (eg.
500msecs of slience) to force the queue to flush prior to fetching, or
3) Put the message in a preceding field with a zero 'timeout': this
forces the queue to flush prior to document flush. 


     I would like to propose an additional fetch property (section
6.3.5)
     .... fetchaudiorepeatdelay ==> defined as the delay between
successive
     plays of the fetchaudio.

We have considered this feature for VoiceXML 2.0 but have decided to
defer it for future versions of the language where we will provide a
more powerful model for generating output (and collecting input) during
document transition. 



George Clelland
EMEA Voice Systems
DirectTalk & Message Center pre-sales Technical Support
IBM UK Laboratories
Hursley Park, Mail Point 104
Winchester
Hants, UK  SO21 2JN
email: george_clelland@uk.ibm.com      Tel: +44 (0)1962 816657    Fax:
+44
(0)1962 816800
Received on Saturday, 16 February 2002 18:31:57 UTC