RE: video in ssml:audio?

Al,

Thanks for this input - it is certainly something we need to keep in
mind as we consider support for video in the next version of VoiceXML.
I'll add it to the VoiceXML CR list. 

In http://lists.w3.org/Archives/Member/w3c-voice-wg/2004Dec/0019.html,
we discuss some issues including whether it might be more appropriate to
use another element like <video> to separate between audio and video
media. Even if we were to continue using video with the audio element,
the <desc> could have similar semantics for a video resource: e.g. if
the fallback contains a <desc> it should be used for text-only output
and should provide a text description of the video content (just as it
can provide a description of audio content). 

We also have to consider the impact of using media formats like
mpeg4/3gpp which allow multiple media streams - audio, video, and
potentially a text stream. Conventionally, these streams are played back
in parallel, but there may be work within the mpeg community which is
already looking at how the different streams can be used for fallback or
as an alternative, depending on device capabilities. If you know of any
such work, please let us know.  

Thanks

Scott
      

-----Original Message-----
From: www-voice-request@w3.org [mailto:www-voice-request@w3.org] On
Behalf Of Al Gilman
Sent: Wednesday, March 23, 2005 17:29
To: www-voice@w3.org
Subject: video in ssml:audio?


One of the bleeding-edge things shown at the W3C Technical Plenary was a
VoiceXML application driving a G3 videophone with video in the mix.  MTV
by phone.

http://www.w3.org/2005/03/plenary-minutes#Session8

This was done by using the 'audio' element to play a video.

This is also one of the possibly scary thing shown there.

In SSML and allied formats, we came to an agreeable design for how to
use the text content of the 'audio' element as text alternative; with
the <desc> element available if the sound is a sonicon or other
non-speech effect and the plain content for a text representation of the
spoken language if the sound was recorded speech.

http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#S3.3.1

Best I recall, the structure of alternatives for a video is more complex
than this simple formula.  The simple alternative structure we agreed to
in the 'audio' element was, consciously or less consciously, impacted by
assumed pragmatic limits on the complexity of what one would put in one
'audio' object in the context of an audio+speech+DTMF dialog over the
phone.

In this brave new world of VoiceXML serving video streams, we may need
to re-examine the support for fallbacks and alternates.

Al

Received on Wednesday, 23 March 2005 17:33:51 UTC