- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Thu, 24 Sep 2009 09:51:00 -0400
- To: <public-webapps@w3.org>
- Cc: "'Kazuyuki Ashimura'" <ashimura@w3.org>
Hello WebApps WG, The Multimodal Interaction Working Group is working on specifications that will support distributed applications that include inputs from different modalities, such as speech, graphics and handwriting. We believe there's some applicability of specific WebAPI specs such as XMLHttpRequest and Server-sent Events to our use cases and we're hoping to get some comments/feedback/suggestions from you. Here's a brief overview of how Multimodal Interaction and WebAPI specs might interact. The Multimodal Architecture [1] is a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations. The aim of this design is to provide a general and flexible framework providing interoperability among modality-specific components from different vendors - for example, speech recognition from one vendor and handwriting recognition from another. This framework focuses on providing a general means for allowing these components to communicate with each other, plus basic infrastructure for application control and platform services. The basic components of an application conforming to the Multimodal Architecture are (1) a set of components which provide modality-related services, such as GUI interaction, speech recognition and handwriting recognition, as well as more specialized modalities such as biometric input, and (2) an Interaction Manager which coordinates inputs from different modalities with the goal of providing a seamless and well-integrated multimodal user experience. One use case of particular interest is a distributed one, in which a server-based Interaction Manager (using, for example SCXML [2]) controls a GUI component based on a (mobile or desktop) web browser, along with a distributed speech recognition component. "Authoring Applications for the Multimodal Architecture" [3] describes this type of an application in more detail. If, for example, speech recognition is distributed, the Interaction Manager receives results from the recognizer and will need to inform the browser of a spoken user input so that the graphical user interface can reflect that information. For example, the user might say "November 2, 2009" and that information would be displayed in a text field in the browser. However, this requires that the server be able to send an event to the browser to tell it to update the display. Current implementations do this by having the brower poll for the server for possible updates on a frequent basis, but we believe that a better approach would be for the browser to actually be able to receive events from the server. So our main question is, what mechanisms are or will be available to support efficient communication among distributed components (for example, speech recognizers, interaction managers, and web browsers) that interact to create a multimodal application,(hence our interest in server-sent events and XMLHttpRequest)? [1] MMI Architecture: http://www.w3.org/TR/mmi-arch/ [2] SCXML: http://www.w3.org/TR/scxml/ [3] MMI Example: http://www.w3.org/TR/mmi-auth/ Regards, Debbie Dahl MMIWG Chair
Received on Thursday, 24 September 2009 13:51:49 UTC