- From: Eric S. Johansson <esj@harvee.org>
- Date: Mon, 13 Dec 2010 19:43:54 -0500
- To: Bjorn Bringert <bringert@google.com>
- CC: Robert Brown <Robert.Brown@microsoft.com>, "Olli@pettay.fi" <Olli@pettay.fi>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 12/9/2010 1:49 PM, Bjorn Bringert wrote: > On Thu, Dec 9, 2010 at 5:21 PM, Eric S. Johansson<esj@harvee.org> wrote: >> On 12/8/2010 4:24 PM, Robert Brown wrote: >>> I think that's right. It originally came from Eric's post in September: >>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0015.html >>> >>> In that context, it seems to be about a specific style of application that >>> could be built with the API, rather than the API itself. So I agree it's >>> out of scope. >> It wasn't intended to be a specific style of application. It was a poorly >> worded attempt to convey a fundamental concept in speech enabled >> environments. Any (web) application is going to be a complete disaster for >> speech users ( google apps [1]). in order to make an application speech >> usable, it will be necessary to create a whole new user interface layer >> around the application in order to drive it. If the application is designed >> to be used with speech he won't be as much of a disaster but you'll still >> need the basic grammars and actions to drive it. >> >> If you assume that all applications will come with a speech user interface >> complete and usable from day one, then you're right, r19 is out of scope. If >> you want to require that any application user interface can be modified or >> extended based on the user's needs then we need something like r19. >> >> I would suggest a little more discussion on r19 because end-user >> customizations for user interfaces is one of the major differences between >> visual user interfaces and aural ones. I'd like to make sure what I'm >> seeing as important is the same thing as the rest of you. >> >> --- eric >> >> [1] Not intending to pick on Google apps it's just that they are very common >> and almost completely unusable if you use speech recognition. I can't even >> use Google mail with speech. It's part nuance, part browser. > Are there any other web standards that include something like this? Or > any non-web speech applications that allow it? Could you propose any > (strawman) mechanism for it? > > While I think that supporting user extensions is a noble idea, I can't > really see what concrete form this would take as part of a web > standard. > http://imagebin.org/127790 This is a rough sketch of what I've been thinking about I make a comment or as I read the discussion. if there is something confusing or unclear, say so I'll redraw. This image illustrates a remote recognition engine environment. In the illustration, I show a remote speech recognition engine, a local application driven by the speech recognition engine and a remote application being driven by the speech recognition engine. each application has a grammar and a set of actions. I know I'm not using the right terminology but it escapes me at the moment. when a grammar rule completes it references an action routine and that action routine is executed with context about the grammar rule that invoked it. I believe grammar rules belong with a speech recognition engine because of performance. The action routines belong in the context of the application they are controlling. A working example would be if I am working with a local application (speech-enabled minesweeper) the grammar associated with minesweeper is replicated to the speech recognition engine and we time I say command it runs one of the action routines on my machine. Then my boss comes along and I'm need to switch to the application I'm really supposed to be using (speech-enabled spreadsheet) and the recognition engine now understands the context shift activates the spreadsheet grammar and calls action routines on the remote machine to drive the application. Pushing the idea little further, let's say I come up with a user interface for the spreadsheet that is far better than what the manufacturer gave me. I create a grammar associated with application and action routines associated with that grammar. When I use the application, my grammar takes precedence so I can replace grammar rules with ones that function the way I want to but not change the grammar itself. My action routines take precedence because I'm installing an update or replacing some functionality. When I looked at this idea again after I got your e-mail, it dawned on me that this model could work with local or remote recognition engines. The only thing different is the latency. What's wrong with this idea? security. It's a pretty convenient way to gain remote control over recognition engine or an application and from there to lay one hop to the client machine. I think security doesn't have to be a showstopper it's just that we need to address it right up front and potentially come up with a validation suite to make sure that security holes are closed. This system lets the end-user take control over what the manufacturers generate For a grammar set and action rule set. They might not like this. As result, there may be enough pushback to prevent a system like this, which is fundamental to good accessibility, from ever existing.
Received on Tuesday, 14 December 2010 00:45:07 UTC