Video media support in VoiceXML and Changes requested for VoiceXM L 3 from Teemu.Tingander@tecnomen.com on 2004-12-17 (www-voice@w3.org from October to December 2004)

From: <Teemu.Tingander@tecnomen.com>
Date: Fri, 17 Dec 2004 09:10:56 +0200
To: www-voice@w3.org, scott.mcglashan@hp.com
Message-ID: <BC70F0884912B54E9F65043933CFF57501320DFF@aunty.tecnomen.fi>
Hi ! 

VIDEO SUPPORT

Just to advertise us I want to point out that we at Tecnomen
(www.tecnomen.com) have also implemented the video support using audio tag.
(The video works with h323, SIP (and in 3gpp phones in 3gpp networks)) It
seemed natural to use it this way. Please reuse audio and do not create a
new tag.

VIDEO RECORDING

In our implementation record type is controlled (no surprises here) with
type attribute so in systems where it is possible to record video from user
the "video/3gp" results video to be recorded, but if user is using plain old
PSTN, the system reverts to (lets say audio/3gp) amr audio. Type of record
is then returned (cause it can change because of remote capabilities) in
name$.type shadow variable that can be posted to app server and page logic
stays quite the same and simple for app developers. Also playing video to
remote that does not support it reverts only audio channel to be played.

OTHER NOTES ABOUT VIDEO

We are experimenting to use fetchaudio as video and we have also extended it
to be also used as idle (audio?) so when ever there has been nothing to be
collected form user the fetchaudio(read video) starts up using the rules of
fetchaudio minimum and timeout. Why we call it 'idle' is the reason that it
is also played in situations like; time taking script and long going data
fetch. Etc.. Video should be also be noted in here. I hope this idle issue
could be checked against VXML 2.1 (or 3.0). 

Then of course 'collect video(audio)' (read dtmf/asr collect background play
| record background | transfer background video in case that is no video
support on another end.) case. This is a another feature were testing..
We´ll let you know  how it went..

In out system We have separate channels that may overlap so it is possible
to put tts on the top of ambient music playing video. I hope that committee
will think this for a while cause the feature is quite nice.

VOICEXML (3.0 | 2.1 | 2.0) Generally

RECORD.. AND TRANSFER..

In another addition to Your previous mail regarding VoiceXML 3.0,( Among
many other things that I have posted to 2.1 committee and to list) I hope
that in VXML 3.0 we could make analogue with transfer and record hang-up
handling, if transfer  happened the behavior of hang-up should be similar to
record and app developer to let know how long the transfer took time Also in
new upcoming world of different kind of "phones" the another ends
capabilities could interest the app developer. It should be considered to
add this information available as shadow variables (like was the call video
or voice call ?) perhaps signaling parameters of remote end could be mapped
under some shadow variable ? 

Then I hope that we can get rid of the annoying feature in record In chapter
2.3.6 " In particular, if no audio is collected before the user terminates
recording with DTMF input matching a local DTMF grammar (or when the
dtmfterm attribute is set to true), then the record variable is not filled
(so shadow variables are not set), and the FIA applies as normal without a
noinput event being thrown. However, information about the input may be
available in these situations via application.lastresult$ as described in
Section 5.1.5.". I don't like the idea of running FIA in here, it just does
not make any sense. And how the input in application.lastresult$ is
accessible cause no filled nor event is triggered nor reprompt or similar ?
If this kind of feature is really needed the system could just skip the
prompt and restart recording. No FIA needed here.

 Then the issue with silence detection should be cleared (if silence
detection removes empty audio from recording the maxtime should be the total
length of produced audio file ?). Also if user keeps silent in the begin of
record and then starts to speak and recording is started and (In simple make
analogue with maxtime in transfer and record) Currently there is no way to
limit the size of recording precisely.  In nutcase I hope that following
rules would apply:

timeout - The time of maximum silence in begin of recording.

Maxtime - Maximum total length of generated recording.  (no initial silence
included in here (if platform is capable of silence clipping)).

finalsilence - The time of maximum silence in after user has stopped
speaking.


VCR prompt control

We have also implemented <playcontrol/>s (very much in analogue with link
except scoping )for system (audio) output controlling. I'm waiting for more
input about this in upcoming VoiceXML 3.0. Out element implementation is
<playcontrol dtmf="DTMF_SEQ" action="action (like volume)" value="+3db"/>..
If anyone is interested.

OTHER ISSUES

I personally dislike the VoiceXML sliding towards a programming language
with DOM data very strict checking's in some elements and so on.. I hope
that we can keep it as a dialog modeling language not a replacement for app
server. Keep the tiers in their place ..   

More to come but I really hope that I get some answers for my earlier
requests too :)

- Teemu
Received on Friday, 17 December 2004 07:10:45 UTC