- From: Eric S. Johansson <esj@harvee.org>
- Date: Thu, 09 Sep 2010 11:41:28 -0400
- CC: public-xg-htmlspeech@w3.org
On 9/9/2010 8:59 AM, Satish Sampath wrote: > Here are some requirements we came up as part of our earlier API proposal. > > - The API must notify the web app when a spoken utterance has been recognized. > > - The API must notify the web app on speech recognition errors. > > - The API should provide access to a list of speech recognition hypotheses. > > - The API should allow, but not require, specifying a grammar for the > speech recognizer to use. > > - The API should allow specifying the natural language in which to > perform speech recognition. This will override the language of the web > page. > > - For privacy reasons, the API should not allow web apps access to raw > audio data but only provide recognition results. > > - For privacy reason, speech recognition should only be started in > response to user action. > > - Web app developers should not have to run their own speech > recognition services. Nor should they be excluded from running their own speech recognition services For reasons of privacy. I dictate confidential information. I don't want anything concerning my dictations leaving my machine. If speech recognition is present, all keystroke shortcuts to application functions should be turned off because misrecognition and accidental recognition events can cause unintended action. End users should be prevented from creating or extend existing grammars on both a global and per application basis. End-user extensions should be accessible either from the desktop or from the cloud. For reasons of privacy, the user should not be forced to store anything about their speech recognition environment on the cloud. Any public interfaces for creating extensions should be "speakable". A user should never need to touch the keyboard in order to expand a grammar, reference data, or add functionality. I've been trying to figure out the right way to express these last few concepts but I'm sure they will come with time a conversation. Currently, local speech recognition services (i.e. NaturallySpeaking) degrade both in terms of performance and accuracy if they are coupled to an application which is slow. Well-known phenomena, Nuance doesn't seem to be interested in fixing it. Web applications are among the worst offenders for degradation of recognition accuracy and speed. Don't know any fixes now but, this is something to keep an eye on. The services described for Web services, would be good for the desktop as well. Given that I'm a person who rarely use web applications (see performance/reliability problems above especially chrome crashing when receiving dictation events), it would be useful to many users like myself to have this kind of capability on the desktop. at the very least, there should be no boundary between desktop and web app speech recognition functionality. I see no mention of retrieval of contents of a text area for editing purposes. Look at NaturallySpeaking's Select-and-Say functionality. It works very nicely for small grain text editing. I'm also experimenting with speech user interfaces for non-English text dictation. The basic model is selected region by speech, run the selected region through transformation, edit the transformed text by speech, run text through reverse transform and replace selected region with new text. For additional examples of what disabled speech recognition users have been working with for the past 10 years, check out vocola, dragonfly, unimacro, and the base, natlink.
Received on Thursday, 9 September 2010 15:42:27 UTC