- From: Dirk Schnelle-Walka <dirk.schnelle@jvoicexml.org>
- Date: Tue, 15 Nov 2016 15:07:53 +0100 (CET)
- To: public-voiceinteraction@w3.org, Deborah Dahl <Dahl@conversational-Technologies.com>, David Pautler <david@intentionperception.org>
Hey there, some time ago I had some first thoughts with Okko Buss on an integration of incremental speech processing into VoiceXML. Okk was working on his PhD at the University of Bielefeld in the domain of incremental dialogs. We started to sketch something that we call VoiceXML AJAX. I opened the Google Docs doucments to be viewed by everybody: https://docs.google.com/document/d/1jVd-K3H_8UrrSYRCjmVHSqZaonHqdHdPWaLv4QSj5c8/edit?usp=sharing Maybe, this goes into the direction that David had in mind? Thank you, Dirk > Deborah Dahl <Dahl@conversational-Technologies.com> hat am 15. November 2016 um 04:17 geschrieben: > > > Hi David, > > Thanks for your comments. > > This sounds like a great use case. EMMA 2.0 [1] provides some capability for incremental inputs and outputs, but I think that’s only a building block for the whole use case because given incremental input and output, it’s still necessary for the system to figure out how to respond. Also, the Web Speech API [2] has incremental output for speech recognition. Again, that’s just a building block. > > It would be very interesting if you could post a more detailed description of this use case to the list, and if you have a proposal that would be interesting, too. > > If you have links to SARA and MACH, that would also be helpful. > > Best, > > Debbie > > [1] https://www.w3.org/TR/emma20/ > > [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html > > > > From: David Pautler [mailto:david@intentionperception.org] > Sent: Monday, November 14, 2016 8:06 PM > To: public-voiceinteraction@w3.org > Subject: Incremental recognition, Unobtrusive response > > > > There are several multimodal virtual agents like MACH and SARA that provide partial interpretation of what the user is saying or expressing facially ("incremental recognition") as well as backchannel 'listener actions' ("unobtrusive response") based on those interpretations. This style of interaction is much more human-like than the strictly turn-based style of Vxml (and related W3C specs) and of all chatbot platforms I'm aware of. > > Is this interaction style (which might be called "IRUR") among the use cases of any planned update to a W3C spec? > > Cheers, > David >
Received on Tuesday, 15 November 2016 14:09:05 UTC