- From: Eric S. Johansson <esj@harvee.org>
- Date: Thu, 09 Sep 2010 17:54:47 -0400
- To: Deborah Dahl <dahl@conversational-technologies.com>
- CC: public-xg-htmlspeech@w3.org
On 9/9/2010 4:32 PM, Deborah Dahl wrote: > Use case 2: user-controlled speech parameters > User has difficulty speaking quickly enough for the existing timeouts > because of a speech, reading or cognitive disability and would like to > lengthen the speech timeout. For example, I've heard anecdotally that speech > timeouts are extremely stressful for people who stutter and actually make > their stuttering worse. Stuttering or cognitive impairments are not necessary to have trouble with the pressures imposed by continuous speech recognition. You'll probably hear about this from other speech recognition users but the process of speaking continuous speech is sometimes very stressful because you need to put together the entire sentence or command is a single thought, work out all the arguments your mind, then say it, and correct any misrecognition's, and then move onto the next one. I know what I'm writing text, I find myself saying the first half of one sentence and the second half of another because as I got through the process of thinking about what I was saying, I change my mind. Long timeouts are also frustrating because you learn not to dictate too much because the cost of correcting a recognition is so high. One I'm writing fiction, sometimes I don't pay attention to the screen for a paragraph or more and usually I end up with half a paragraph of crap because the recognition process fell off the face of the earth and gave me a set of words that I didn't say in a language I don't know. I've since learned that keeping my eye on the recognition box is critically important. When the delays in recognition performance cross the 5 second mark, then the stress of holding in your mind what you want to say next becomes stress in your body and you can't dictate is much before causing damage. > B. Use case that motivates a requirement to make it easy to integrate input > from different modalities. > Use case: User is using a mobile friend-finding application and says, "is > Mary Smith anywhere around here?" To answer this question the application > should combine information from geolocation (to understand "here"), speech > recognition, and potentially even speaker verification information to > insure that Mary Smith has actually authorized the user to know where she > is. New modalities are continually becoming available, so it would be > difficult to provide for integration on a case by case basis. Wouldn't the application simply provide the "is < user> anywhere around here" grammar to the default recognizer as well as a list of values for "user"? I imagine in return it would get the top five users and their confidence values. Once the geolocation application has that information, then it would go off into it's own magic that makes the user happy. Voice recognition and speech recognition are two radically different processes. You can do it with the same audio stream but only right from the raw data. And yes, voice recognition would be a really good authentication tool although I'm not sure about holding the camera up to your face for retinal scans. A bit too minority report for me. > D. Use cases that motivate a requirement to always make the use of speech > optional. > Use case 1: the user can't speak, can't speak the language well enough to be > recognized, the speech recognizer just doesn't work well for them, or they > like typing. > Use case 2: the user is in an environment where speech is inappropriate, > like a meeting, or they want to communicate something private, or it's just > noisy. Train, plane, airport, bus station, subway stop. Most of these can be solved by using a steno mask although you might draw the attention of law enforcement and terrorist phobic people if you're wearing something that covers your face and looks like a gas mask. > E. Use case that the standard should support completely hands-free > operation. > This would mean that there should be a way to speech-enable everything that > you would do with a mouse, a touchscreen, or by typing. > Use case: the user doesn't have the use of their hands, either temporarily > or permanently, or using their hands is difficult. For example, someone is > repairing a machine, their hands are holding tools and are dirty, but they > want to browse an HTML manual for the machine. > I realize there are a lot of difficulties in completely hands-free > operation, but I wanted to put it out for discussion. It would be good to > explore how close we can come. We can do a lot better than we have with hands-free operation. As I've said elsewhere, the user interface needs to be radically different from a GUI interface. It needs to do appropriate hinting when the user stalls and most importantly, you don't want to do anything that even vaguely stresses a person's throat. I've blew my hands after 18 years of programming. I've managed to keep my voice intact over the last 15 years by not using every hands-free option in NaturallySpeaking I've tried all of them and they all are dangerous to the throat and I would have to be in dire straits to count on them. If I lose my voice, I am so royally screwed. I don't want SSDI and section 8 housing to be part of my future. I know of two people in the Boston area who have had this happen to them. I've seen others potentially be in that state but I've lost track of them. If you ever want to interview disabled speech recognition users, let me know. I can array for some through the Boston voice users group.
Received on Thursday, 9 September 2010 21:56:23 UTC