Re: Agreed recognition API? from Eric S. Johansson on 2011-05-23 (public-xg-htmlspeech@w3.org from May 2011)

From: Eric S. Johansson <esj@harvee.org>
Date: Mon, 23 May 2011 16:26:02 -0400
To: Bjorn Bringert <bringert@google.com>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <4DDAC2DA.80707@harvee.org>
On 5/20/2011 11:07 AM, Bjorn Bringert wrote:
> On Fri, May 20, 2011 at 3
> It sounds like you want general APIs for accessing data in web apps.
> That sounds like a good idea, but doesn't really have very much to do
> with speech as far as I can tell. To make this a bit more concrete,
> perhaps you could propose some APIs that you would like web browsers
> to implement?
>

apologies for taking so long to respond.

There is a short answer and the long answer to your question. Short answer today.

The interface is relatively simple. It's a classic setter/getter plus a 
bidirectional event mechanism. The external application gets some values, sets 
some values, and receives an event notification if a watch valued changes or 
sends an event notification if something is changed.  The user application 
provides three ways to "view" the data which are the entire set of data, the 
data displayed, and data selected.

In any accessibility interface, there are three components, the user 
application, accessibility mechanism, and the interface bridge. The interface 
bridge is the conduit and conversion to/from presentation for user application 
data and the accessibility mechanism.  The reason for this three components 
split is political/economics. It minimizes the effort on the part of the 
application vendor and the accessibility mechanism vendor. It puts most of the 
responsibility for the bridge between the two on the end-user. In practice I 
expect vendors or an organization dedicated to accessibility would supply a 
reference implementation that the end user could customize.

When the user types data into the user application if the interface bridge is 
listening, it would receive events telling it that the data has changed. If the 
accessibility mechanism is speech recognition, dictating some text index the 
text or some transformation of it into the user application buffer.

Data changes are not the only events the interface bridge receives. When the 
user application first receives focus, it notifies the interface bridge of the 
event. Interface bridge would then use this information about what has focus to 
set up the accessibility interface with the right context. For example, 
activating a grammar for speech recognition engine.

that's basically it in a nutshell. There are lots of other things such as 
cursors, selection of focus, selecting regions etc. but I wanted to get you the 
basic concept. I'll follow with a longer answer containing more detail in a few 
days.

I do want to address one point which is how it ties into speech. As soon as you 
glue speech recognition into an application, you potentially eliminate its 
usefulness for accessibility. You would need to incorporate the equivalent of 
what I've described here into every application independently. But if you have 
an API in application that accessibility bridge can make use of and let the 
accessibility bridge have the responsibility for speaking to the speech 
recognition engine, you potentially lower the cost of enabling an application 
and make it more customizable for the "statistical outlier" user.

Case in point. Enabling every single application in Google apps is going to be a 
big hairy deal. But instead, if you made each one it's public information 
available via some API (get ("current_cursor_position", event_when_changed) then 
you may be able to reuse the interface bridge to provide simple speech 
recognition capability and be able to build on that for greater levels 
accessibility.

--- eric
Received on Monday, 23 May 2011 20:26:58 UTC