W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > December 2010

Re: R19. End user extensions should be available both on desktop and in cloud

From: Eric S. Johansson <esj@harvee.org>
Date: Mon, 13 Dec 2010 19:43:54 -0500
Message-ID: <4D06BDCA.9010304@harvee.org>
To: Bjorn Bringert <bringert@google.com>
CC: Robert Brown <Robert.Brown@microsoft.com>, "Olli@pettay.fi" <Olli@pettay.fi>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 12/9/2010 1:49 PM, Bjorn Bringert wrote:
> On Thu, Dec 9, 2010 at 5:21 PM, Eric S. Johansson<esj@harvee.org>  wrote:
>> On 12/8/2010 4:24 PM, Robert Brown wrote:
>>> I think that's right. It originally came from Eric's post in September:
>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0015.html
>>> In that context, it seems to be about a specific style of application that
>>> could be built with the API, rather than the API itself.  So I agree it's
>>> out of scope.
>> It wasn't intended to be a specific style of application. It was a poorly
>> worded attempt to convey a fundamental concept in speech enabled
>> environments.  Any (web) application is going to be a complete disaster for
>> speech users ( google apps [1]).  in order to make an application speech
>> usable, it will be necessary to create a whole new user interface layer
>> around the application in order to drive it.  If the application is designed
>> to be used with speech he won't be as much of a disaster but you'll still
>> need the basic grammars and actions to drive it.
>> If you assume that all applications will come with a speech user interface
>> complete and usable from day one, then you're right, r19 is out of scope. If
>> you want to require that any application user interface can be modified or
>> extended based on the user's needs then we need something like r19.
>> I would suggest a little more discussion on r19 because end-user
>> customizations for user interfaces is one of the major differences between
>> visual user interfaces and aural ones.  I'd like to make sure what I'm
>> seeing as important is the same thing as the rest of you.
>> --- eric
>> [1] Not intending to pick on Google apps it's just that they are very common
>> and almost completely unusable if you use speech recognition. I can't even
>> use Google mail with speech. It's part nuance, part browser.
> Are there any other web standards that include something like this? Or
> any non-web speech applications that allow it? Could you propose any
> (strawman) mechanism for it?
> While I think that supporting user extensions is a noble idea, I can't
> really see what concrete form this would take as part of a web
> standard.

This is a rough sketch of what I've been thinking about I make a comment or as I 
read the discussion.  if there is something confusing or unclear, say so I'll 
redraw.  This image illustrates a remote recognition engine environment.

In the illustration, I show a remote speech recognition engine, a local 
application driven by the speech recognition engine and a remote application 
being driven by the speech recognition engine.  each application has a grammar 
and a set of actions. I know I'm not using the right terminology but it escapes 
me at the moment.   when a grammar rule completes it references an action 
routine and that action routine is executed with context about the grammar rule 
that invoked it. I believe grammar rules belong with a speech recognition engine 
because of performance. The action routines belong in the context of the 
application they are controlling.

A working example would be if I am working with a local application 
(speech-enabled minesweeper) the grammar associated with minesweeper is 
replicated to the speech recognition engine and we time I say command it runs 
one of the action routines on my machine.  Then my boss comes along and I'm need 
to switch to the application I'm really supposed to be using (speech-enabled 
spreadsheet) and the recognition engine now understands the context shift 
activates the spreadsheet grammar and calls action routines on the remote 
machine to drive the application.

Pushing the idea little further, let's say I come up with a user interface for 
the spreadsheet that is far better than what the manufacturer gave me. I create 
a grammar associated with application and action routines associated with that 
grammar. When I use the application, my grammar takes precedence so I can 
replace grammar rules with ones that function the way I want to but not change 
the grammar itself. My action routines take precedence because I'm installing an 
update or replacing some functionality.

When I looked at this idea again after I got your e-mail, it dawned on me that 
this model could work with local or remote recognition engines. The only thing 
different is the latency.

What's wrong with this idea?

security. It's a pretty convenient way to gain remote control over recognition 
engine or an application and from there to lay one hop to the client machine. I 
think security doesn't have to be a showstopper it's just that we need to 
address it right up front and potentially come up with a validation suite to 
make sure that security holes are closed.

This system lets the end-user take control over what the manufacturers generate 
For a grammar set and action rule set. They might not like this. As result, 
there may be enough pushback to prevent a system like this, which is fundamental 
to good accessibility, from ever existing.
Received on Tuesday, 14 December 2010 00:45:07 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:48 UTC