W3C home > Mailing lists > Public > www-voice@w3.org > October to December 2004

[v3] Video media support in VoiceXML

From: McGlashan, Scott <scott.mcglashan@hp.com>
Date: Mon, 6 Dec 2004 13:48:39 +0100
Message-ID: <990C7D7B4096DC49B7BDB5D4F56E5B9F64F2CE@sooexc02.emea.cpqcorp.net>
To: <www-voice@w3.org>
Cc: <w3c-voice-wg@w3.org>


We propose that VoiceXML extends its interactive media support to
include
video. With the advent of IP phones (e.g. SIP, H.323) and in 3G mobile
networks,
where audio and video can be streamed to handsets, applications like
video
mail, video portals and other audio-visual interactive applications are
now
being planned, developed and deployed. The emergence of 3G-324M (a
protocol
for carrying video on circuit-switched mobile channels on 3G networks)
is
available in several countries and planned in many others.

Although VoiceXML is primarily targeted at interactive audio
applications, it
is natural to extend it with a video input/output media channel to
complement
the audio channel. This has the benefit that all the dialog handling
capability which is available for audio applications are now available
to the
authors of video applications. 

In VoiceXML 2.0, basic video playback, record can be supported without
modification to the language. 

For video playback, the <audio> can be used with a uri to a video
resource. For example,

<audio src="http://www.example.com/video.mp4"/>

When the platform fetches the resource from the web server, it has a
video
mime type; for example, "video/3gp" for 3GPP .3gp video resources, or
"video/mp4" for MPEG4 video resources. If the platform is able to
support the video media type, then video can be queued and played back
via the
IP/3G network. Bargin on the audio channel applies with the result of
"freezing" 
the video (fine tuning of behaviour may be possible through VoiceXML
<property>s).
All attributes of the current <audio> element are appropriate for video
(src, expr, fetchtimeout, fetchhint, maxage, maxstale) as well as the
fallback
content of the element (e.g. fallback from video to audio or TTS). Just
like 
non-mandatory audio types (e.g. mp3 audio), if a platform doesn't
support the 
video media type, it will use fallback content.

For video recording, the <record> element can be used with a video type
attribute. Again like non-mandatory audio types, if the video media type
isn't
supported, the platform throws the appropriate error message. Most of
the
existing <record> attributes are appropriate (name, expr, cond, type, 
modal, dtmfterm, beep), but 'finalsilence' can be ignored. 

Similar to audio recordings, video recordings may be submitted to the
webserver
using the standard VoiceXML <submit> element with the
multipart/form-data
encoding.

This approach does have some issues which require further analysis for
VoiceXML 3.0, including whether there should be a separate <video>
element
rather than re-using <audio>, and how controls for video-specific
operations
can be added. If others on this list are interested continuing this
discussion
offline, please let us know.

Scott McGlashan, HP
Dave Burke, Voxpilot

_________

Scott McGlashan

Service Interaction, OCBU, HP
36 Gustav III:s Boulevard
SE-169 85 Stockholm, Sweden

+46 8 524 95683
Received on Monday, 6 December 2004 12:49:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 October 2006 12:49:00 GMT