RE: Requirement for UA / SS protocol from Young, Milan on 2010-11-19 (public-xg-htmlspeech@w3.org from November 2010)

From: Young, Milan <Milan.Young@nuance.com>
Date: Fri, 19 Nov 2010 14:19:10 -0800
To: "Eric S. Johansson" <esj@harvee.org>
Cc: "Robert Brown" <Robert.Brown@microsoft.com>, <public-xg-htmlspeech@w3.org>
Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD09631468@SUN-EXCH01.nuance.com>
Hello Eric,

You are mainly talking in terms of use cases, and since I don't have
much of an accessibility background, I'm having a hard time
understanding it all.  But from what I understand you have two
fundamental suggestions:

  * The Speech Service may need to fetch audio from a source other than
the User Agent.

  * The Speech Service needs to be able to initiate interactions with
the User Agent rather than being relegated to the role of a server.

Is that right?


-----Original Message-----
From: Eric S. Johansson [mailto:esj@harvee.org] 
Sent: Friday, November 19, 2010 8:57 AM
To: Young, Milan
Cc: Robert Brown; public-xg-htmlspeech@w3.org
Subject: Re: Requirement for UA / SS protocol

On 11/19/2010 10:23 AM, Young, Milan wrote:
> Hello Eric,
>
> I must admit that web applications are not my expertise.  I'm having a
> hard time understanding why the protocol needs to be expanded to
handle
> these new unidirectional events.
>
> If the event should be sent from the web-app to the application
server,
> then couldn't this be done using AJAX, or some other standard web
> technology?
>
> If the event is to be sent between the SS and application server, then
> shouldn't this be triggered with an implementation-specific parameter?
> It seems like a stretch to make this part of the specification.
>

Application runs on machine M. It's bound to it for whatever reason
either 
residency, licensing or working data. My audio front-end runs on machine
A. How 
to lie tell my speech recognition engine to work with the application
bound to 
machine M (i.e. send its results and get its application cues) but use
the audio 
from Machine A as well as as any grammars. and action code?

This is a very common scenario for disabled user. Experience and common
sense 
shows that it's completely impractical to expect a disabled user to sit
in front 
of a machine and provide/receive audio for an application running on
that 
machine. In Web applications, it gets even more complicated because
you're 
recognition engine may be on a different cloud application base than the

application itself and you've got two different machines for the user,
the one 
that runs the application and the one that deals with user interface.

Here's a practical example I dealt with this week. Got a customer with a
gold 
mine server running on a Windows server 2008. He uses a virtual machine
to speak 
to Goldmine (one VM for every employee). He uses Carbonite for backup of
the 
database. So I use my peech recognition to dictate to the SQL Server
database, 
Carbonite, and Goldmine clients in a virtual machine all over two
different 
types of remote console tools and it was miserable because nobody knew
anything 
about anybody else. At the very least, I should have been able to
communicate 
state between application and my recognition engine but there was no
protocol of 
any kind to talk together.

In the context of speech recognition and Web applications here, we could
lay the 
framework for such a protocol or least connection options so that all of
the 
applications have a way of speaking to a remote recognizer whether it be
local 
to the user or somewhere out on the cloud. The beauty is, it really
doesn't 
matter whether the recognizer is local or remote. It's all fundamentally
the 
same events, grammars etc. one could think of it as having the big cloud
or a 
little cloud (my desktop).

Yeah, you can send event notifications using standard Web technologies.
I don't 
consider it a stretch to make it part of the specification because we
are 
specifying communications paths and the purposes for those paths.   I
could give 
you a whole accessibility rant but, the point I want to get to is the
spec 
should create a framework that disabled people can use to build a useful

environment for themselves because vendors are never going to do it for
them.

I'm not married to unidirectional, bidirectional, special protocols. I
just want 
enough flexibility so that we don't find disabled people saying "damn,
locked 
out again". In order to do that, I think there needs to be a discussion
of how 
information flows between user, application, speech recognition engine
and how 
all those things can exist on multiple machines.
Received on Friday, 19 November 2010 22:20:37 UTC