RE: Use case from CSTR

Hi All,

Here are some use cases in CAS ( Chinese Academy of Sciences)

 "Emotion recognition of dialogue speech"
We allow listeners to label the speech with multiple emotions to form the
emotion vector, then, train a classification tree model to predict emotion
vectors from acoustic features. The final emotion recognition results are
used in the dialogue system on line. The dialogue system use results to
determine the prior level the task from customers. Negtive emotions will
result in quick service.


"Generate Emotional Speech"
We generated an Emotional Speech System with both voice/prosody coversion
method (from neutral speech to emotional speech) and Emotion Markup
Languages (tags). The system is integrated into our TTS system and used for
dialogue speech genertation in conversational system. The project is also
supported by National Natural Science Fundation of China

"The Multimodal based Emotion Recognition"
In traditional human computer interaction, the lack of the coordination
mechanism of parameters under multi-model condition quite limits the emotion
recognition. The fusing of different channels is not just the combination of
them, but to find the mutual relations among them. We built emotion
recognition system which is based on audio-visual information in CASIA. Both
facial and audio data were recorded, the detailed features, such as facial
expression parameters, voice quality parameters, prosody parameters, etc.
were figured out. The mutual relations between audio-visual information were
also analyzed. With all above work, the multimodal parameters were
integrated into a recognition model.

"Expressive facial animation"
We are doing a new coding method which can give more detailed control of
facial animation with synchronized voice. The coding system was finally
transfered into FAPs which is defined in MPEG-4. The coding method allows
the user to configure and build systems for many applications by allowing
flexibility in the system configurations, by providing various levels of
interactivity with audio-visual content. 

Best regards,
Jianhua

Received on Saturday, 7 October 2006 13:33:21 UTC