Re: Multimodal Interaction WG questions for WebApps (especially WebAPI)

On 9/24/09 4:51 PM, Deborah Dahl wrote:
> Hello WebApps WG,
> The Multimodal Interaction Working Group is working on specifications
> that will support distributed applications that include inputs from
> different modalities, such as speech, graphics and handwriting. We
> believe there's some applicability of specific WebAPI specs such
> as XMLHttpRequest and Server-sent Events to our use cases and we're
> hoping to get some comments/feedback/suggestions from you.
> Here's a brief overview of how Multimodal Interaction and WebAPI
> specs might interact.
> The Multimodal Architecture [1] is a loosely coupled architecture for
> multimodal user interfaces, which allows for co-resident and distributed
> implementations. The aim of this design is to provide a general and flexible
> framework providing interoperability among modality-specific components from
> different vendors - for example, speech recognition from one vendor and
> handwriting recognition from another. This framework focuses on providing a
> general means for allowing these components to communicate with each other,
> plus basic infrastructure for application control and platform services.
> The basic components of an application conforming to the Multimodal
> Architecture are (1) a set of components which provide modality-related
> services, such as GUI interaction, speech recognition and handwriting
> recognition, as well as more specialized modalities such as biometric input,
> and (2) an Interaction Manager which coordinates inputs from different
> modalities with the goal of providing a seamless and well-integrated
> multimodal user experience. One use case of particular interest is a
> distributed one, in which a server-based Interaction Manager (using, for
> example SCXML [2]) controls a GUI component based on a (mobile or desktop)
> web browser, along with a distributed speech recognition component.
> "Authoring Applications for the Multimodal Architecture" [3] describes this
> type of an application in more detail. If, for example, speech recognition
> is distributed, the Interaction Manager receives results from the recognizer
> and will need to inform the browser of a spoken user input so that the
> graphical user interface can reflect that information. For example, the user
> might say "November 2, 2009" and that information would be displayed in a
> text field in the browser. However, this requires that the server be able to
> send an event to the browser to tell it to update the display. Current
> implementations do this by having the brower poll for the server for
> possible updates on a frequent basis, but we believe that a better approach
> would be for the browser to actually be able to receive events from the
> server.
> So our main question is, what mechanisms are or will be available to
> support efficient communication among distributed components (for
> example, speech recognizers, interaction managers, and web browsers)
> that interact to create a multimodal application,(hence our interest
> in server-sent events and XMLHttpRequest)?

I believe WebSockets could work a lot better than XHR or server-sent 
events. IM would be a WebSocket server and it would have bi-directional
connection to modality components.


> [1] MMI Architecture:
> [2] SCXML:
> [3] MMI Example:
> Regards,
> Debbie Dahl
> MMIWG Chair

Received on Thursday, 24 September 2009 14:19:40 UTC