- From: McGlashan, Scott <scott.mcglashan@hp.com>
- Date: Mon, 6 Dec 2004 13:48:39 +0100
- To: <www-voice@w3.org>
- Cc: <w3c-voice-wg@w3.org>
We propose that VoiceXML extends its interactive media support to include video. With the advent of IP phones (e.g. SIP, H.323) and in 3G mobile networks, where audio and video can be streamed to handsets, applications like video mail, video portals and other audio-visual interactive applications are now being planned, developed and deployed. The emergence of 3G-324M (a protocol for carrying video on circuit-switched mobile channels on 3G networks) is available in several countries and planned in many others. Although VoiceXML is primarily targeted at interactive audio applications, it is natural to extend it with a video input/output media channel to complement the audio channel. This has the benefit that all the dialog handling capability which is available for audio applications are now available to the authors of video applications. In VoiceXML 2.0, basic video playback, record can be supported without modification to the language. For video playback, the <audio> can be used with a uri to a video resource. For example, <audio src="http://www.example.com/video.mp4"/> When the platform fetches the resource from the web server, it has a video mime type; for example, "video/3gp" for 3GPP .3gp video resources, or "video/mp4" for MPEG4 video resources. If the platform is able to support the video media type, then video can be queued and played back via the IP/3G network. Bargin on the audio channel applies with the result of "freezing" the video (fine tuning of behaviour may be possible through VoiceXML <property>s). All attributes of the current <audio> element are appropriate for video (src, expr, fetchtimeout, fetchhint, maxage, maxstale) as well as the fallback content of the element (e.g. fallback from video to audio or TTS). Just like non-mandatory audio types (e.g. mp3 audio), if a platform doesn't support the video media type, it will use fallback content. For video recording, the <record> element can be used with a video type attribute. Again like non-mandatory audio types, if the video media type isn't supported, the platform throws the appropriate error message. Most of the existing <record> attributes are appropriate (name, expr, cond, type, modal, dtmfterm, beep), but 'finalsilence' can be ignored. Similar to audio recordings, video recordings may be submitted to the webserver using the standard VoiceXML <submit> element with the multipart/form-data encoding. This approach does have some issues which require further analysis for VoiceXML 3.0, including whether there should be a separate <video> element rather than re-using <audio>, and how controls for video-specific operations can be added. If others on this list are interested continuing this discussion offline, please let us know. Scott McGlashan, HP Dave Burke, Voxpilot _________ Scott McGlashan Service Interaction, OCBU, HP 36 Gustav III:s Boulevard SE-169 85 Stockholm, Sweden +46 8 524 95683
Received on Monday, 6 December 2004 12:49:33 UTC