- From: Jianhua TAO <jhtao@nlpr.ia.ac.cn>
- Date: Sat, 7 Oct 2006 21:33:13 +0800
- To: <public-xg-emotion@w3.org>
Hi All, Here are some use cases in CAS ( Chinese Academy of Sciences) "Emotion recognition of dialogue speech" We allow listeners to label the speech with multiple emotions to form the emotion vector, then, train a classification tree model to predict emotion vectors from acoustic features. The final emotion recognition results are used in the dialogue system on line. The dialogue system use results to determine the prior level the task from customers. Negtive emotions will result in quick service. "Generate Emotional Speech" We generated an Emotional Speech System with both voice/prosody coversion method (from neutral speech to emotional speech) and Emotion Markup Languages (tags). The system is integrated into our TTS system and used for dialogue speech genertation in conversational system. The project is also supported by National Natural Science Fundation of China "The Multimodal based Emotion Recognition" In traditional human computer interaction, the lack of the coordination mechanism of parameters under multi-model condition quite limits the emotion recognition. The fusing of different channels is not just the combination of them, but to find the mutual relations among them. We built emotion recognition system which is based on audio-visual information in CASIA. Both facial and audio data were recorded, the detailed features, such as facial expression parameters, voice quality parameters, prosody parameters, etc. were figured out. The mutual relations between audio-visual information were also analyzed. With all above work, the multimodal parameters were integrated into a recognition model. "Expressive facial animation" We are doing a new coding method which can give more detailed control of facial animation with synchronized voice. The coding system was finally transfered into FAPs which is defined in MPEG-4. The coding method allows the user to configure and build systems for many applications by allowing flexibility in the system configurations, by providing various levels of interactivity with audio-visual content. Best regards, Jianhua
Received on Saturday, 7 October 2006 13:33:21 UTC