Re: R19. End user extensions should be available both on desktop and in cloud from Bjorn Bringert on 2010-12-14 (public-xg-htmlspeech@w3.org from December 2010)

From: Bjorn Bringert <bringert@google.com>
Date: Tue, 14 Dec 2010 12:32:59 +0000
To: "Eric S. Johansson" <esj@harvee.org>
Cc: Robert Brown <Robert.Brown@microsoft.com>, "Olli@pettay.fi" <Olli@pettay.fi>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <AANLkTikzJP_9JvTUhBUPPk3ULW4xR5WeO7LiQB5nNZNR@mail.gmail.com>
On Tue, Dec 14, 2010 at 12:43 AM, Eric S. Johansson <esj@harvee.org> wrote:
> On 12/9/2010 1:49 PM, Bjorn Bringert wrote:
>>
>> On Thu, Dec 9, 2010 at 5:21 PM, Eric S. Johansson<esj@harvee.org>  wrote:
>>>
>>> On 12/8/2010 4:24 PM, Robert Brown wrote:
>>>>
>>>> I think that's right. It originally came from Eric's post in September:
>>>>
>>>> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0015.html
>>>>
>>>> In that context, it seems to be about a specific style of application
>>>> that
>>>> could be built with the API, rather than the API itself.  So I agree
>>>> it's
>>>> out of scope.
>>>
>>> It wasn't intended to be a specific style of application. It was a poorly
>>> worded attempt to convey a fundamental concept in speech enabled
>>> environments.  Any (web) application is going to be a complete disaster
>>> for
>>> speech users ( google apps [1]).  in order to make an application speech
>>> usable, it will be necessary to create a whole new user interface layer
>>> around the application in order to drive it.  If the application is
>>> designed
>>> to be used with speech he won't be as much of a disaster but you'll still
>>> need the basic grammars and actions to drive it.
>>>
>>> If you assume that all applications will come with a speech user
>>> interface
>>> complete and usable from day one, then you're right, r19 is out of scope.
>>> If
>>> you want to require that any application user interface can be modified
>>> or
>>> extended based on the user's needs then we need something like r19.
>>>
>>> I would suggest a little more discussion on r19 because end-user
>>> customizations for user interfaces is one of the major differences
>>> between
>>> visual user interfaces and aural ones.  I'd like to make sure what I'm
>>> seeing as important is the same thing as the rest of you.
>>>
>>> --- eric
>>>
>>> [1] Not intending to pick on Google apps it's just that they are very
>>> common
>>> and almost completely unusable if you use speech recognition. I can't
>>> even
>>> use Google mail with speech. It's part nuance, part browser.
>>
>> Are there any other web standards that include something like this? Or
>> any non-web speech applications that allow it? Could you propose any
>> (strawman) mechanism for it?
>>
>> While I think that supporting user extensions is a noble idea, I can't
>> really see what concrete form this would take as part of a web
>> standard.
>>
> http://imagebin.org/127790
>
> This is a rough sketch of what I've been thinking about I make a comment or
> as I read the discussion.  if there is something confusing or unclear, say
> so I'll redraw.  This image illustrates a remote recognition engine
> environment.
>
> In the illustration, I show a remote speech recognition engine, a local
> application driven by the speech recognition engine and a remote application
> being driven by the speech recognition engine.  each application has a
> grammar and a set of actions. I know I'm not using the right terminology but
> it escapes me at the moment.   when a grammar rule completes it references
> an action routine and that action routine is executed with context about the
> grammar rule that invoked it. I believe grammar rules belong with a speech
> recognition engine because of performance. The action routines belong in the
> context of the application they are controlling.
>
> A working example would be if I am working with a local application
> (speech-enabled minesweeper) the grammar associated with minesweeper is
> replicated to the speech recognition engine and we time I say command it
> runs one of the action routines on my machine.  Then my boss comes along and
> I'm need to switch to the application I'm really supposed to be using
> (speech-enabled spreadsheet) and the recognition engine now understands the
> context shift activates the spreadsheet grammar and calls action routines on
> the remote machine to drive the application.
>
> Pushing the idea little further, let's say I come up with a user interface
> for the spreadsheet that is far better than what the manufacturer gave me. I
> create a grammar associated with application and action routines associated
> with that grammar. When I use the application, my grammar takes precedence
> so I can replace grammar rules with ones that function the way I want to but
> not change the grammar itself. My action routines take precedence because
> I'm installing an update or replacing some functionality.
>
> When I looked at this idea again after I got your e-mail, it dawned on me
> that this model could work with local or remote recognition engines. The
> only thing different is the latency.
>
> What's wrong with this idea?
>
> security. It's a pretty convenient way to gain remote control over
> recognition engine or an application and from there to lay one hop to the
> client machine. I think security doesn't have to be a showstopper it's just
> that we need to address it right up front and potentially come up with a
> validation suite to make sure that security holes are closed.
>
> This system lets the end-user take control over what the manufacturers
> generate For a grammar set and action rule set. They might not like this. As
> result, there may be enough pushback to prevent a system like this, which is
> fundamental to good accessibility, from ever existing.

This kind of user modification of the UI sounds like what people
currently do with Greasemonkey scripts and other browser extensions. I
think that the same mechanisms should work fine for speech UIs as
well, e.g. by using user-scripts to modify the DOM. I think that
allowing this is a browser feature, and that it falls outside the
scope of most web standards. For example, I don't think that any HTML
specs require allowing user modifications.

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Tuesday, 14 December 2010 12:33:30 UTC