- From: Eric S. Johansson <esj@harvee.org>
- Date: Tue, 14 Dec 2010 12:58:10 -0500
- To: Bjorn Bringert <bringert@google.com>
- CC: Robert Brown <Robert.Brown@microsoft.com>, "Olli@pettay.fi" <Olli@pettay.fi>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 12/14/2010 7:32 AM, Bjorn Bringert wrote: > This kind of user modification of the UI sounds like what people > currently do with Greasemonkey scripts and other browser extensions. I > think that the same mechanisms should work fine for speech UIs as > well, e.g. by using user-scripts to modify the DOM. I think that > allowing this is a browser feature, and that it falls outside the > scope of most web standards. For example, I don't think that any HTML > specs require allowing user modifications. > I will look into grease monkey scripts but I suspect they don't really do what is necessary. I think you are right that no HTML spec allows user modification and that's because the user interface model for HTML is extremely restricted. HTML can barely address the user to user variations require by simple things such as color blindness or navigation needs let alone something more sophisticated like speech recognition or text-to-speech. yes there are significant hacks but, they break a regular basis. One important factor why your ability to modify your environment Independent of the software vendor is that it can make a difference in the employability of a disabled person. For me, this is the biggest social reason why we should make this kind of capability available. I'll admit I've had trouble. I've been belittled and my skills seen as lesser just because my hands don't work. I've seen the same thing happened every other RSI disabled person I know and this is not a small number of people. The ability to modify your environment so you can work as fast as or faster than any temporarily able-bodied person is incredibly important. When you use a speech user interface (not IVR), you always look for ways to compress what you say and maximize what you do in order to maintain the health of your speaking apparatus. It's an expected capability to be able to change what your environment to maximize your productivity. I've written speech commands that move data between two applications. It wasn't reliable because I couldn't speak to the named field but, it kind of sort worked most of the time. I know people in the medical field who are producing little pop up menu boxes to collect user information and then place it back in the medical records management application. These pop up boxes are independent applications (grammars and action routines) activated when the medical records application is running. In the handicap accessibility realm, almost everybody uses some tool to create macros to make their environment work right. One more thing that you might not think of as important in a speech recognition environment and that is turning off single character commands to perform various operations. I cannot tell you the number of times I've screwed up Thunderbird when I had the wrong focus and that is dictating without looking and messages are gone, moved to different folders, they have tags and priorities changed. It's a disaster. Turning those things off would not be wonderful for most users but for me and anyone else using speech recognition, it would be a godsend. The user modifications should be able to do this. I think the user should be able to build new menus and task bars but that's another discussion. All of these modifications require that the environment and the speech recognition engine play nice with user data. So the environment should have a way of saying "here's the user grammar for this application" and give it to the speech recognition engine. The engine should have a way of saying to the user environment ", here's a grammar rule that terminated properly, go do this. This is protocol, and API (bidirectional) which is what we've been talking about. If we specify it well enough, then one set of grammars and action routines can be used for all browsers enabling a larger pool of users to share their work. This is really important in the disability community because it is so hard for us to get the work done with current generations of tools. I believe this is a fundamental requirement for a speech recognition environment to successfully address the needs of the majority of users and vendors of software.. Even though I am convinced that this capability belongs within the scope of this standards effort, I understand that others may feel differently. If you disagree, where do you think it would fit into the various standards efforts?
Received on Tuesday, 14 December 2010 18:00:10 UTC