- From: Bjorn Bringert <bringert@google.com>
- Date: Tue, 14 Dec 2010 12:32:59 +0000
- To: "Eric S. Johansson" <esj@harvee.org>
- Cc: Robert Brown <Robert.Brown@microsoft.com>, "Olli@pettay.fi" <Olli@pettay.fi>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On Tue, Dec 14, 2010 at 12:43 AM, Eric S. Johansson <esj@harvee.org> wrote: > On 12/9/2010 1:49 PM, Bjorn Bringert wrote: >> >> On Thu, Dec 9, 2010 at 5:21 PM, Eric S. Johansson<esj@harvee.org> wrote: >>> >>> On 12/8/2010 4:24 PM, Robert Brown wrote: >>>> >>>> I think that's right. It originally came from Eric's post in September: >>>> >>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0015.html >>>> >>>> In that context, it seems to be about a specific style of application >>>> that >>>> could be built with the API, rather than the API itself. So I agree >>>> it's >>>> out of scope. >>> >>> It wasn't intended to be a specific style of application. It was a poorly >>> worded attempt to convey a fundamental concept in speech enabled >>> environments. Any (web) application is going to be a complete disaster >>> for >>> speech users ( google apps [1]). in order to make an application speech >>> usable, it will be necessary to create a whole new user interface layer >>> around the application in order to drive it. If the application is >>> designed >>> to be used with speech he won't be as much of a disaster but you'll still >>> need the basic grammars and actions to drive it. >>> >>> If you assume that all applications will come with a speech user >>> interface >>> complete and usable from day one, then you're right, r19 is out of scope. >>> If >>> you want to require that any application user interface can be modified >>> or >>> extended based on the user's needs then we need something like r19. >>> >>> I would suggest a little more discussion on r19 because end-user >>> customizations for user interfaces is one of the major differences >>> between >>> visual user interfaces and aural ones. I'd like to make sure what I'm >>> seeing as important is the same thing as the rest of you. >>> >>> --- eric >>> >>> [1] Not intending to pick on Google apps it's just that they are very >>> common >>> and almost completely unusable if you use speech recognition. I can't >>> even >>> use Google mail with speech. It's part nuance, part browser. >> >> Are there any other web standards that include something like this? Or >> any non-web speech applications that allow it? Could you propose any >> (strawman) mechanism for it? >> >> While I think that supporting user extensions is a noble idea, I can't >> really see what concrete form this would take as part of a web >> standard. >> > http://imagebin.org/127790 > > This is a rough sketch of what I've been thinking about I make a comment or > as I read the discussion. if there is something confusing or unclear, say > so I'll redraw. This image illustrates a remote recognition engine > environment. > > In the illustration, I show a remote speech recognition engine, a local > application driven by the speech recognition engine and a remote application > being driven by the speech recognition engine. each application has a > grammar and a set of actions. I know I'm not using the right terminology but > it escapes me at the moment. when a grammar rule completes it references > an action routine and that action routine is executed with context about the > grammar rule that invoked it. I believe grammar rules belong with a speech > recognition engine because of performance. The action routines belong in the > context of the application they are controlling. > > A working example would be if I am working with a local application > (speech-enabled minesweeper) the grammar associated with minesweeper is > replicated to the speech recognition engine and we time I say command it > runs one of the action routines on my machine. Then my boss comes along and > I'm need to switch to the application I'm really supposed to be using > (speech-enabled spreadsheet) and the recognition engine now understands the > context shift activates the spreadsheet grammar and calls action routines on > the remote machine to drive the application. > > Pushing the idea little further, let's say I come up with a user interface > for the spreadsheet that is far better than what the manufacturer gave me. I > create a grammar associated with application and action routines associated > with that grammar. When I use the application, my grammar takes precedence > so I can replace grammar rules with ones that function the way I want to but > not change the grammar itself. My action routines take precedence because > I'm installing an update or replacing some functionality. > > When I looked at this idea again after I got your e-mail, it dawned on me > that this model could work with local or remote recognition engines. The > only thing different is the latency. > > What's wrong with this idea? > > security. It's a pretty convenient way to gain remote control over > recognition engine or an application and from there to lay one hop to the > client machine. I think security doesn't have to be a showstopper it's just > that we need to address it right up front and potentially come up with a > validation suite to make sure that security holes are closed. > > This system lets the end-user take control over what the manufacturers > generate For a grammar set and action rule set. They might not like this. As > result, there may be enough pushback to prevent a system like this, which is > fundamental to good accessibility, from ever existing. This kind of user modification of the UI sounds like what people currently do with Greasemonkey scripts and other browser extensions. I think that the same mechanisms should work fine for speech UIs as well, e.g. by using user-scripts to modify the DOM. I think that allowing this is a browser feature, and that it falls outside the scope of most web standards. For example, I don't think that any HTML specs require allowing user modifications. -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Tuesday, 14 December 2010 12:33:30 UTC