- From: Young, Milan <Milan.Young@nuance.com>
- Date: Fri, 19 Nov 2010 07:23:25 -0800
- To: "Eric S. Johansson" <esj@harvee.org>, "Robert Brown" <Robert.Brown@microsoft.com>
- Cc: <public-xg-htmlspeech@w3.org>
Hello Eric, I must admit that web applications are not my expertise. I'm having a hard time understanding why the protocol needs to be expanded to handle these new unidirectional events. If the event should be sent from the web-app to the application server, then couldn't this be done using AJAX, or some other standard web technology? If the event is to be sent between the SS and application server, then shouldn't this be triggered with an implementation-specific parameter? It seems like a stretch to make this part of the specification. Thanks -----Original Message----- From: Eric S. Johansson [mailto:esj@harvee.org] Sent: Thursday, November 18, 2010 7:05 PM To: Robert Brown Cc: Young, Milan; public-xg-htmlspeech@w3.org Subject: Re: Requirement for UA / SS protocol On 11/18/2010 8:49 PM, Robert Brown wrote: > > I mostly agree. But do we need bidirectional events? I suspect all the > interesting ones originate at the server: start-of-speech; hypothesis; partial > result; warnings of noise, crosstalk, etc. I'm trying to think why the server > would care about events from the client, other than when the client is done > sending audio (which it could do in response to a click or end-point detection). > I think we do need bidirectional events but more specifically, we need two unidirectional events that can end up on different machines. In my mind, when I look at a speech driven application, there are four major subsystems. o The recognizer, o The application, o The speech user interface application for the vendor and o A speech user interface application from the end-user. These subsystems exist whether or not the recognizer is local or remote and the same interaction between recognition engine and vendor and user interface applications also exists independent of recognizer location. It's possible to locate the user interface application either on the server or the client. There can be two significant chunks of the application encapsulated as an external subsystems that could reside either local or remotely. If the applications are local then you want results passed down for local action. I don't think there's symmetry for upstream because the results would just be handed to the server-side copy and not over the wire to the local copy. We should look at what role the client will have at a minimum. I think it would be smart for the client to control a lot of front-end signal processing, audio management type of stuff that's a fair amount of detail to hand off upstream. Is there any quality of service data should be gathered at the client on the front end? I need to go back and check the archives to see if we've talked about A-B speech recognition environments. A-B environments are where you dictate a one machine and all the results are delivered to a remote machine including the results of a vendor or user application. You might even have a user application on a remote machine reacting to utterances. Think dictating to a virtual machine from your host. If you have a remote recognition engine, you need to connect both machines to the recognition engine so one can receive recognition results and the other hears what you say.
Received on Friday, 19 November 2010 15:24:06 UTC