RE: Requirement for UA / SS protocol

I have an uneasy feeling about this.

There are multiple layers of problems in the examples, but they don't sound like speech problems.  They sound a lot like problems with either: the remoting of audio I/O over terminal software; or the integration of accessibility tools with terminal applications. 


-----Original Message-----
From: Young, Milan [mailto:Milan.Young@nuance.com] 
Sent: Friday, November 19, 2010 2:19 PM
To: Eric S. Johansson
Cc: Robert Brown; public-xg-htmlspeech@w3.org
Subject: RE: Requirement for UA / SS protocol

Hello Eric,

You are mainly talking in terms of use cases, and since I don't have much of an accessibility background, I'm having a hard time understanding it all.  But from what I understand you have two fundamental suggestions:

  * The Speech Service may need to fetch audio from a source other than the User Agent.

  * The Speech Service needs to be able to initiate interactions with the User Agent rather than being relegated to the role of a server.

Is that right?


-----Original Message-----
From: Eric S. Johansson [mailto:esj@harvee.org]
Sent: Friday, November 19, 2010 8:57 AM
To: Young, Milan
Cc: Robert Brown; public-xg-htmlspeech@w3.org
Subject: Re: Requirement for UA / SS protocol

On 11/19/2010 10:23 AM, Young, Milan wrote:
> Hello Eric,
>
> I must admit that web applications are not my expertise.  I'm having a 
> hard time understanding why the protocol needs to be expanded to
handle
> these new unidirectional events.
>
> If the event should be sent from the web-app to the application
server,
> then couldn't this be done using AJAX, or some other standard web 
> technology?
>
> If the event is to be sent between the SS and application server, then 
> shouldn't this be triggered with an implementation-specific parameter?
> It seems like a stretch to make this part of the specification.
>

Application runs on machine M. It's bound to it for whatever reason either residency, licensing or working data. My audio front-end runs on machine A. How to lie tell my speech recognition engine to work with the application bound to machine M (i.e. send its results and get its application cues) but use the audio from Machine A as well as as any grammars. and action code?

This is a very common scenario for disabled user. Experience and common sense shows that it's completely impractical to expect a disabled user to sit in front of a machine and provide/receive audio for an application running on that machine. In Web applications, it gets even more complicated because you're recognition engine may be on a different cloud application base than the

application itself and you've got two different machines for the user, the one that runs the application and the one that deals with user interface.

Here's a practical example I dealt with this week. Got a customer with a gold mine server running on a Windows server 2008. He uses a virtual machine to speak to Goldmine (one VM for every employee). He uses Carbonite for backup of the database. So I use my peech recognition to dictate to the SQL Server database, Carbonite, and Goldmine clients in a virtual machine all over two different types of remote console tools and it was miserable because nobody knew anything about anybody else. At the very least, I should have been able to communicate state between application and my recognition engine but there was no protocol of any kind to talk together.

In the context of speech recognition and Web applications here, we could lay the framework for such a protocol or least connection options so that all of the applications have a way of speaking to a remote recognizer whether it be local to the user or somewhere out on the cloud. The beauty is, it really doesn't matter whether the recognizer is local or remote. It's all fundamentally the same events, grammars etc. one could think of it as having the big cloud or a little cloud (my desktop).

Yeah, you can send event notifications using standard Web technologies.
I don't
consider it a stretch to make it part of the specification because we are 
specifying communications paths and the purposes for those paths.   I
could give
you a whole accessibility rant but, the point I want to get to is the spec should create a framework that disabled people can use to build a useful

environment for themselves because vendors are never going to do it for them.

I'm not married to unidirectional, bidirectional, special protocols. I just want enough flexibility so that we don't find disabled people saying "damn, locked out again". In order to do that, I think there needs to be a discussion of how information flows between user, application, speech recognition engine and how all those things can exist on multiple machines.

Received on Friday, 19 November 2010 23:58:26 UTC