- From: Al Gilman <Alfred.S.Gilman@IEEE.org>
- Date: Wed, 23 Mar 2005 11:28:50 -0500
- To: www-voice@w3.org
One of the bleeding-edge things shown at the W3C Technical Plenary was a VoiceXML application driving a G3 videophone with video in the mix. MTV by phone. http://www.w3.org/2005/03/plenary-minutes#Session8 This was done by using the 'audio' element to play a video. This is also one of the possibly scary thing shown there. In SSML and allied formats, we came to an agreeable design for how to use the text content of the 'audio' element as text alternative; with the <desc> element available if the sound is a sonicon or other non-speech effect and the plain content for a text representation of the spoken language if the sound was recorded speech. http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#S3.3.1 Best I recall, the structure of alternatives for a video is more complex than this simple formula. The simple alternative structure we agreed to in the 'audio' element was, consciously or less consciously, impacted by assumed pragmatic limits on the complexity of what one would put in one 'audio' object in the context of an audio+speech+DTMF dialog over the phone. In this brave new world of VoiceXML serving video streams, we may need to re-examine the support for fallbacks and alternates. Al
Received on Wednesday, 23 March 2005 16:58:36 UTC