W3C home > Mailing lists > Public > www-voice@w3.org > October to December 2004

[v3] Video media support in VoiceXML

From: McGlashan, Scott <scott.mcglashan@hp.com>
Date: Mon, 6 Dec 2004 13:48:39 +0100
Message-ID: <990C7D7B4096DC49B7BDB5D4F56E5B9F64F2CE@sooexc02.emea.cpqcorp.net>
To: <www-voice@w3.org>
Cc: <w3c-voice-wg@w3.org>

We propose that VoiceXML extends its interactive media support to
video. With the advent of IP phones (e.g. SIP, H.323) and in 3G mobile
where audio and video can be streamed to handsets, applications like
mail, video portals and other audio-visual interactive applications are
being planned, developed and deployed. The emergence of 3G-324M (a
for carrying video on circuit-switched mobile channels on 3G networks)
available in several countries and planned in many others.

Although VoiceXML is primarily targeted at interactive audio
applications, it
is natural to extend it with a video input/output media channel to
the audio channel. This has the benefit that all the dialog handling
capability which is available for audio applications are now available
to the
authors of video applications. 

In VoiceXML 2.0, basic video playback, record can be supported without
modification to the language. 

For video playback, the <audio> can be used with a uri to a video
resource. For example,

<audio src="http://www.example.com/video.mp4"/>

When the platform fetches the resource from the web server, it has a
mime type; for example, "video/3gp" for 3GPP .3gp video resources, or
"video/mp4" for MPEG4 video resources. If the platform is able to
support the video media type, then video can be queued and played back
via the
IP/3G network. Bargin on the audio channel applies with the result of
the video (fine tuning of behaviour may be possible through VoiceXML
All attributes of the current <audio> element are appropriate for video
(src, expr, fetchtimeout, fetchhint, maxage, maxstale) as well as the
content of the element (e.g. fallback from video to audio or TTS). Just
non-mandatory audio types (e.g. mp3 audio), if a platform doesn't
support the 
video media type, it will use fallback content.

For video recording, the <record> element can be used with a video type
attribute. Again like non-mandatory audio types, if the video media type
supported, the platform throws the appropriate error message. Most of
existing <record> attributes are appropriate (name, expr, cond, type, 
modal, dtmfterm, beep), but 'finalsilence' can be ignored. 

Similar to audio recordings, video recordings may be submitted to the
using the standard VoiceXML <submit> element with the

This approach does have some issues which require further analysis for
VoiceXML 3.0, including whether there should be a separate <video>
rather than re-using <audio>, and how controls for video-specific
can be added. If others on this list are interested continuing this
offline, please let us know.

Scott McGlashan, HP
Dave Burke, Voxpilot


Scott McGlashan

Service Interaction, OCBU, HP
36 Gustav III:s Boulevard
SE-169 85 Stockholm, Sweden

+46 8 524 95683
Received on Monday, 6 December 2004 12:49:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:07:37 UTC