RE: Requirement for UA / SS protocol

Thanks Eric,

I can imagine a system like you describe, where a stateful UX engine drives multiple alternate user interfaces in the same interactive session, and I can see why it would be useful.  I think I just feel that it's a much bigger problem than surfacing speech in HTML, and out of scope for this XG.

-----Original Message-----
From: Eric S. Johansson [mailto:esj@harvee.org] 
Sent: Sunday, November 21, 2010 9:03 PM
To: Robert Brown
Cc: Young, Milan; public-xg-htmlspeech@w3.org
Subject: Re: Requirement for UA / SS protocol

On 11/19/2010 6:57 PM, Robert Brown wrote:
> I have an uneasy feeling about this.
>
> There are multiple layers of problems in the examples, but they don't sound like speech problems.  They sound a lot like problems with either: the remoting of audio I/O over terminal software; or the integration of accessibility tools with terminal applications.

I can understand the uneasiness. Yes it is in part driven by accessibility issuess if only because, accessibility issues show up all the up and down the stack. When people talk accessibility, they usually talking about some simple set of dog and pony tricks they can be easily implemented and aren't too embarrassing to the disabled person if they are caught using them. More advanced accessibility aids are usually a total redesign of user interface. For example, take an ATM. Now most of that scene have a little Jack in the same place for a set of headphones. Imagine that Jack filled epoxy and the bank never noticed for months and months. How you solve the problem of making the interface accessible?

With this split stream model I'm suggesting, it would be possible for a visually impaired person to pull out their handset, speed dial the ATM, stick their card into the ATM, and listen to what the ATM has to say as they navigate the ATM user interface from their handset.

There are a variety of ways to engineer this ranging from all operations at the bank to the bank translates input from speech recognition and runs the application on the ATM. Audio can come either from the ATM or the ATM sends events upstream to a text-to-speech system. Multiple streams, multiple sources, multiple destinations.

There's another issue which is if it isn't done now, will it be done at all? 
Ever since the first hominid busted a leg and was forever crippled, accessibility has been a second thought. We have a chance here to carefully look at the specification and engineer in accessibility from the start. You might consider a bit of a reach but this example shows why it should be done. Today, I can tell you at least two or three ways to provide the multiple host target capabilities we've been talking about using NaturallySpeaking.

Would it be useful? Yes
do people need it? Yes
Could improve employability of disabled programmers and administrators? Yes Does nuance have an interest in doing it? No, not in the slightest.
Will it ever be done? Not unless I become filthy rich or google/Red Hat/ubuntu funds this effort.
Will doing it independently of infrastructure force reinvention of the wheel? Yes

When I made a presentation to nuance about doing this, the first reaction, from a marketing guy, was "can we charge for every machine we connect to?" While that could be appropriate in certain circumstances when talking about handicap accessibility, that's just plain evil.

This capability, in some form, could be added as an afterthought but there is little or no chance that it ever will be.  it's this inertia that has kept handicap accessibility frozen or even taking a step backwards over the past 15+ years.

If If there is no further discussion on the topic, I'll accept it as voting the idea down and will say no more on the topic.

Thanks for listening/reading
--- eric

Received on Monday, 22 November 2010 23:31:44 UTC