video in ssml:audio? from Al Gilman on 2005-03-23 (www-voice@w3.org from January to March 2005)

From: Al Gilman <Alfred.S.Gilman@IEEE.org>
Date: Wed, 23 Mar 2005 11:28:50 -0500
To: www-voice@w3.org
Message-Id: <p06110405be674587b84f@[10.0.1.2]>

One of the bleeding-edge things shown at the W3C Technical Plenary 
was a VoiceXML
application driving a G3 videophone with video in the mix.  MTV by phone.

http://www.w3.org/2005/03/plenary-minutes#Session8

This was done by using the 'audio' element to play a video.

This is also one of the possibly scary thing shown there.

In SSML and allied formats, we came to an agreeable design for how to
use the text content of the 'audio' element as text alternative; with
the <desc> element available if the sound is a sonicon or other
non-speech effect and the plain content for a text representation of
the spoken language if the sound was recorded speech.

http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#S3.3.1

Best I recall, the structure of alternatives for a video is more 
complex than this
simple formula.  The simple alternative structure we agreed to in the 
'audio' element
was, consciously or less consciously, impacted by assumed pragmatic limits
on the complexity of what one would put in one 'audio' object in the context of
an audio+speech+DTMF dialog over the phone.

In this brave new world of VoiceXML serving video streams, we may need to
re-examine the support for fallbacks and alternates.

Al

Received on Wednesday, 23 March 2005 16:58:36 UTC