Remarks and Questions to VoiceXML 2.0

Remarks and Questions to VoiceXML 2.0 last call draft


After extended testing and experiments with VoiceXML 2.0 we still see
some problems in the usage of VoiceXML.

1)	<subdialog> in mixed initiative dialogue:
	as written in the last draft and as well in the DTD <subdialog>
	is only allowed as child element of <form>.
	Why can't <subdialog> be allowed as child element of
	<field> resp. <filled>?
	This would be fine for getting i.e. confirmation for
	given values and would allow the processing of different
	values separately.
	At the moment some more work has to be done to provide this
	ability with the given possibilities. We would appreciate at
	least to think about to widen the group of allowed parent
	elements of <subdialog>.

2)	recognize from file:
	the <record> element allows the recording of spoken
	utterances. With <audio> the resulting files could
	be played to the user.
	But we miss a possibility to take an audio file instead
	of real spoken input and to recognize from this file for
	further processing. This could be especially interesting for
	off-line processing of dialogues.
	
3)	filled fields in mixed-initiative dialogues:
	in mixed-initiative dialogues, which use one grammar
	for several fields, values which have been set correctly could
	be simply overwritten by new utterances from the user.
	Sometimes this behaviour is wished and good to have, but there
	are situations, where we would wish to deactivate fields, which
	were filled correctly. Is there any work done in this field?
	At the moment we solve this by adding an extra variable for each
	field. But maybe there is a more elegant solution available?

Finally we would like to add another point, although it may be 
beyond the scope of the current discussion:
4)	VoiceXML for embedded applications:
	VoiceXML is mainly good for telephony applications. But for
	embedded applications it takes too much space because of all the
	needed components like HTTP-server, interpreter for cgi scripts,
	and the VoiceXML interpreter itself. Is there any work done in
	the Voice Browser Group of the W3C at this field?
	
	
Thanks a lot for any comments or hints.


Stefan Hamerich
	

-----------------------------
Stefan W. Hamerich
TEMIC Sprachverarbeitung GmbH
Research Department
Soeflinger Str. 100
89077 Ulm
Germany

Tel:      +49/731/3994-123
Fax:      +49/731/3994-250
Mail:     stefan.hamerich@temic-sp.com
Internet: http://www.starrec.com

Received on Friday, 24 May 2002 04:29:40 UTC