- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Thu, 24 Sep 2009 17:18:57 +0300
- To: Deborah Dahl <dahl@conversational-technologies.com>
- CC: public-webapps@w3.org, "'Kazuyuki Ashimura'" <ashimura@w3.org>
On 9/24/09 4:51 PM, Deborah Dahl wrote: > Hello WebApps WG, > > The Multimodal Interaction Working Group is working on specifications > that will support distributed applications that include inputs from > different modalities, such as speech, graphics and handwriting. We > believe there's some applicability of specific WebAPI specs such > as XMLHttpRequest and Server-sent Events to our use cases and we're > hoping to get some comments/feedback/suggestions from you. > > Here's a brief overview of how Multimodal Interaction and WebAPI > specs might interact. > > The Multimodal Architecture [1] is a loosely coupled architecture for > multimodal user interfaces, which allows for co-resident and distributed > implementations. The aim of this design is to provide a general and flexible > framework providing interoperability among modality-specific components from > different vendors - for example, speech recognition from one vendor and > handwriting recognition from another. This framework focuses on providing a > general means for allowing these components to communicate with each other, > plus basic infrastructure for application control and platform services. > > The basic components of an application conforming to the Multimodal > Architecture are (1) a set of components which provide modality-related > services, such as GUI interaction, speech recognition and handwriting > recognition, as well as more specialized modalities such as biometric input, > and (2) an Interaction Manager which coordinates inputs from different > modalities with the goal of providing a seamless and well-integrated > multimodal user experience. One use case of particular interest is a > distributed one, in which a server-based Interaction Manager (using, for > example SCXML [2]) controls a GUI component based on a (mobile or desktop) > web browser, along with a distributed speech recognition component. > "Authoring Applications for the Multimodal Architecture" [3] describes this > type of an application in more detail. If, for example, speech recognition > is distributed, the Interaction Manager receives results from the recognizer > and will need to inform the browser of a spoken user input so that the > graphical user interface can reflect that information. For example, the user > might say "November 2, 2009" and that information would be displayed in a > text field in the browser. However, this requires that the server be able to > send an event to the browser to tell it to update the display. Current > implementations do this by having the brower poll for the server for > possible updates on a frequent basis, but we believe that a better approach > would be for the browser to actually be able to receive events from the > server. > So our main question is, what mechanisms are or will be available to > support efficient communication among distributed components (for > example, speech recognizers, interaction managers, and web browsers) > that interact to create a multimodal application,(hence our interest > in server-sent events and XMLHttpRequest)? I believe WebSockets could work a lot better than XHR or server-sent events. IM would be a WebSocket server and it would have bi-directional connection to modality components. -Olli > > [1] MMI Architecture: http://www.w3.org/TR/mmi-arch/ > [2] SCXML: http://www.w3.org/TR/scxml/ > [3] MMI Example: http://www.w3.org/TR/mmi-auth/ > > Regards, > > Debbie Dahl > MMIWG Chair > > >
Received on Thursday, 24 September 2009 14:19:40 UTC