[dialog] Hamerich - VBWG official response to VoiceXML 2.0 Last Call Review Issues from Scott McGlashan on 2002-09-25 (www-voice@w3.org from July to September 2002)

From: Scott McGlashan <scott.mcglashan@pipebeach.com>
Date: Wed, 25 Sep 2002 16:14:17 +0200
To: <Stefan.Hamerich@temic-sp.com>
Cc: <www-voice@w3.org>
Message-ID: <2764A29BE430E64A92EB56561587D2E7107FA0@se01ms02.i.pipebeach.com>
The Voice Browser Working Group (VBWG) has almost
finished resolving the issues raised during the last call
review of the 24 April 2002 VoiceXML 2.0 [1]. Our apologies that 
it has taken so long to respond.

This is the VBWG's formal response to the issues you raised,
which have been logged in the Working Group's issues list [4].
The VBWG's resolutions have been incorporated into the 13 September
2002 draft of the VoiceXML 2.0 [5]. 

Please indicate before 3 October 2002 whether you are satisfied with the
VBWG's resolutions, whether you think there has been a
misunderstanding, or whether you wish to register an objection.
If you do not think you can respond before 3 October, please let me
know.  The Director will appreciate a response whether you agree
with the resolutions or not.

Below you will find:

 1) More information follows about the process we are following.
 2) A summary of the VBWG's responses to each of your issues.

Thank you,

Scott

-----------------------------------------------
1) Process requirement to address last call issues
-----------------------------------------------

Per section 5.2.3 [2] of the 19th July 2001 Process Document, in
order for the VoiceXML 2.0 to advance to the next state (Candidate
Recommendation), the Working Group must "formally address all
issues raised during the Last Call review period (possibly
modifying the technical report)." Section 4.1.2 of the Process
Document [3] sets expectations about what constitutes a formal
response:

  "In the context of this document, a Working Group has formally
  addressed an issue when the Chair can show (archived) evidence
  of having sent a response to the party who raised the
  issue. This response should include the Working Group's
  resolution and should ask the party who raised the issue to
  reply with an indication of whether the resolution reverses the
  initial objection."

If you feel that the response is based on a misunderstanding of
the original issue, you are encouraged to restate and clarify the
issue until there is agreement about the issue, so that the
Working Group may prepare its substantive response.

If the response shows understanding of the original issue but
does not satisfy the reviewer, you may register a formal
objection with the Working Group that will be carried forward
with the relevant deliverables. 

[1] http://www.w3.org/TR/2002/WD-voicexml20-20020424/
[2] http://www.w3.org/Consortium/Process-20010719/tr.html#RecsCR
[3] http://www.w3.org/Consortium/Process-20010719/groups.html#WGVotes
[4] http://www.w3.org/Voice/Group/2002/voiceXML-change-requests.htm
(members only)
[5] http://www.w3.org/Voice/Group/2002/WD-voicexml20-20020913.htm
(members only)
(http://www.w3.org/Voice/Group/2002/WD-voicexml20-20020913.zip) (members
only)


-----------------------------------------------
2) Issues you raised and responses
-----------------------------------------------

In http://lists.w3.org/Archives/Public/www-voice/2002AprJun/0065.html
you raised 
the following issues which were registered as dialog change request
R470. 
Our response is given inline after each issue.


After extended testing and experiments with VoiceXML 2.0 we still see
some problems in the usage of VoiceXML.

1)	<subdialog> in mixed initiative dialogue:
	as written in the last draft and as well in the DTD <subdialog>
	is only allowed as child element of <form>.
	Why can't <subdialog> be allowed as child element of
	<field> resp. <filled>?
	This would be fine for getting i.e. confirmation for
	given values and would allow the processing of different
	values separately.
	At the moment some more work has to be done to provide this
	ability with the given possibilities. We would appreciate at
	least to think about to widen the group of allowed parent
	elements of <subdialog>.

VBWG Response: Rejected

<subdialog> involves involves collecting user input and that is not part
of executable content (such as <filled>) according to FIA. As you point,
there are workarounds already available in VoiceXML for confirmation and
processing different values separately. However, this issue may be
addressed in the next version of VoiceXML where one tentative
requirement is that the FIA is more flexible and extensible.


2)	recognize from file:
	the <record> element allows the recording of spoken
	utterances. With <audio> the resulting files could
	be played to the user.
	But we miss a possibility to take an audio file instead
	of real spoken input and to recognize from this file for
	further processing. This could be especially interesting for
	off-line processing of dialogues.

VBWG Response: Rejected

The use case is not fundamental to VoiceXML 2.0 since it focuses on
realtime interaction with a user. There is a workaround where user input
can be recorded and then analysed by an external ASR web service. This
is really a batch Use Case (also applicable to Speaker Verification,
multiple ASR passes, messaging, etc) which may be considered for the
next version of VoiceXML.
	
3)	filled fields in mixed-initiative dialogues:
	in mixed-initiative dialogues, which use one grammar
	for several fields, values which have been set correctly could
	be simply overwritten by new utterances from the user.
	Sometimes this behaviour is wished and good to have, but there
	are situations, where we would wish to deactivate fields, which
	were filled correctly. Is there any work done in this field?
	At the moment we solve this by adding an extra variable for each
	field. But maybe there is a more elegant solution available?


VBWG Response: Rejected

When to correctly override variables is an application issue. There is a
workaround by copying variables into a separate space as soon as they
are instantiated: this avoids them being overwritten. The issue may be
re-visited in the next version of VoiceXML when have the opportunity to
provide a better separation of presentation from data structure in
VoiceXML forms (e.g. xforms) and to provide more detailed control of
variable filling.

Finally we would like to add another point, although it may be 
beyond the scope of the current discussion:
4)	VoiceXML for embedded applications:
	VoiceXML is mainly good for telephony applications. But for
	embedded applications it takes too much space because of all the
	needed components like HTTP-server, interpreter for cgi scripts,
	and the VoiceXML interpreter itself. Is there any work done in
	the Voice Browser Group of the W3C at this field?


VBWG Response: Accepted

There are a wide variety of embedded devices and VoiceXML interpreter
have already be used on some, depending on their available resources.
Putting the interpreter, media resources  and the application is more
problematic though (although this is clearly possible on some PDA
devices today). We may address modularization of VoiceXML and device
profiling in the next version of VoiceXML, and this should facilitate
running smaller interpreter profiles.
Received on Wednesday, 25 September 2002 10:15:32 UTC