- From: McGlashan, Scott <scott.mcglashan@hp.com>
- Date: Wed, 23 Mar 2005 18:32:28 +0100
- To: "Al Gilman" <Alfred.S.Gilman@IEEE.org>, <www-voice@w3.org>
Al, Thanks for this input - it is certainly something we need to keep in mind as we consider support for video in the next version of VoiceXML. I'll add it to the VoiceXML CR list. In http://lists.w3.org/Archives/Member/w3c-voice-wg/2004Dec/0019.html, we discuss some issues including whether it might be more appropriate to use another element like <video> to separate between audio and video media. Even if we were to continue using video with the audio element, the <desc> could have similar semantics for a video resource: e.g. if the fallback contains a <desc> it should be used for text-only output and should provide a text description of the video content (just as it can provide a description of audio content). We also have to consider the impact of using media formats like mpeg4/3gpp which allow multiple media streams - audio, video, and potentially a text stream. Conventionally, these streams are played back in parallel, but there may be work within the mpeg community which is already looking at how the different streams can be used for fallback or as an alternative, depending on device capabilities. If you know of any such work, please let us know. Thanks Scott -----Original Message----- From: www-voice-request@w3.org [mailto:www-voice-request@w3.org] On Behalf Of Al Gilman Sent: Wednesday, March 23, 2005 17:29 To: www-voice@w3.org Subject: video in ssml:audio? One of the bleeding-edge things shown at the W3C Technical Plenary was a VoiceXML application driving a G3 videophone with video in the mix. MTV by phone. http://www.w3.org/2005/03/plenary-minutes#Session8 This was done by using the 'audio' element to play a video. This is also one of the possibly scary thing shown there. In SSML and allied formats, we came to an agreeable design for how to use the text content of the 'audio' element as text alternative; with the <desc> element available if the sound is a sonicon or other non-speech effect and the plain content for a text representation of the spoken language if the sound was recorded speech. http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#S3.3.1 Best I recall, the structure of alternatives for a video is more complex than this simple formula. The simple alternative structure we agreed to in the 'audio' element was, consciously or less consciously, impacted by assumed pragmatic limits on the complexity of what one would put in one 'audio' object in the context of an audio+speech+DTMF dialog over the phone. In this brave new world of VoiceXML serving video streams, we may need to re-examine the support for fallbacks and alternates. Al
Received on Wednesday, 23 March 2005 17:33:51 UTC