W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > December 2010

Re: R19. End user extensions should be available both on desktop and in cloud

From: Eric S. Johansson <esj@harvee.org>
Date: Tue, 14 Dec 2010 12:58:10 -0500
Message-ID: <4D07B032.2080405@harvee.org>
To: Bjorn Bringert <bringert@google.com>
CC: Robert Brown <Robert.Brown@microsoft.com>, "Olli@pettay.fi" <Olli@pettay.fi>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 12/14/2010 7:32 AM, Bjorn Bringert wrote:
> This kind of user modification of the UI sounds like what people
> currently do with Greasemonkey scripts and other browser extensions. I
> think that the same mechanisms should work fine for speech UIs as
> well, e.g. by using user-scripts to modify the DOM. I think that
> allowing this is a browser feature, and that it falls outside the
> scope of most web standards. For example, I don't think that any HTML
> specs require allowing user modifications.
I will look into grease monkey scripts but I suspect they don't really do what 
is necessary. I think you are right that no HTML spec allows user modification 
and that's because the user interface model for HTML is extremely restricted. 
HTML can barely address the user to user variations require by simple things 
such as color blindness or navigation needs let alone something more 
sophisticated like speech recognition or text-to-speech.  yes there are 
significant hacks but, they break a regular basis.

One important factor why your ability to modify your environment Independent of 
the software vendor is that it can make a difference in the employability of a 
disabled person. For me, this is the biggest social reason why we should make 
this kind of capability available. I'll admit I've had trouble. I've been 
belittled and my skills seen as lesser just because my hands don't work. I've 
seen the same thing happened every other RSI disabled person I know and this is 
not a small number of people. The ability to modify your environment so you can 
work as fast as or faster than any temporarily able-bodied person is incredibly 

When you use a speech user interface (not IVR), you always look for ways to 
compress what you say and maximize what you do in order to maintain the health 
of your speaking apparatus. It's an expected capability to be able to change 
what your environment to maximize your productivity. I've written speech 
commands that move data between two applications. It wasn't reliable because I 
couldn't speak to the named field but, it kind of sort worked most of the time. 
I know people in the medical field who are producing little pop up menu boxes to 
collect user information and then place it back in the medical records 
management application.  These pop up boxes are independent applications 
(grammars and action routines) activated when the medical records application is 
running.   In the handicap accessibility realm, almost everybody uses some tool 
to create macros to make their environment work right.

One more thing that you might not think of as important in a speech recognition 
environment and that is turning off single character commands to perform various 
operations. I cannot tell you the number of times I've screwed up Thunderbird 
when I had the wrong focus and that is dictating without looking and messages 
are gone, moved to different folders, they have tags and priorities changed. 
It's a disaster. Turning those things off would not be wonderful for most users 
but for me and anyone else using speech recognition, it would be a godsend. The 
user modifications should be able to do this. I think the user should be able to 
build new menus and task bars but that's another discussion.

All of these modifications require that the environment and the speech 
recognition engine play nice with user data. So the environment should have a 
way of saying "here's the user grammar for this application" and give it to the 
speech recognition engine. The engine should have a way of saying to the user 
environment ", here's a grammar rule that terminated properly, go do this.  This 
is protocol, and API (bidirectional) which is what we've been talking about.  If 
we specify it well enough, then one set of grammars and action routines can be 
used for all browsers enabling a larger pool of users to share their work. This 
is really important in the disability community because it is so hard for us to 
get the work  done with current generations of tools.

I believe this is a fundamental requirement for a speech recognition environment 
to successfully address the needs of the majority of users and vendors of 
software.. Even though I am convinced that this capability belongs within the 
scope of this standards effort, I understand that others may feel differently. 
If you disagree, where do you think it would fit into the various standards efforts?
Received on Tuesday, 14 December 2010 18:00:10 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:48 UTC