Re: R21. Any public interface for creating extensions should be speakable from Eric S. Johansson on 2010-11-05 (public-xg-htmlspeech@w3.org from November 2010)

From: Eric S. Johansson <esj@harvee.org>
Date: Fri, 05 Nov 2010 01:14:53 -0400
To: Satish Sampath <satish@google.com>
CC: Robert Brown <Robert.Brown@microsoft.com>, Bjorn Bringert <bringert@google.com>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <4CD392CD.8000704@harvee.org>

On 11/4/2010 6:02 PM, Satish Sampath wrote:
> A set of use cases which clearly describe what 'end user extensions'
> mean would help in deciding whether they should be in scope or out of
> scope of our work.
>
> I believe Eric Johansson suggested these requirements initially. Eric,
> can you help us with a few use cases for these requirements?

sorry about my lack of participation. I was eaten alive by a bunch of things but 
unfortunately not any work involving speech user interfaces.

As a general rule, I argue that user extensions that affect or control part of 
or a complete speech user interface are part of a speech recognition 
environment. with that outlook, it's an "obvious" conclusion to say that it is 
part of our Scope of work.

The best use case I can think of is the overlay of a speech user interface on 
top of a cloud application. In addition to classic command-and-control grammar 
and actions, there should be other Windows displaying or accepting results 
recognition process and translating it to/from what is in the application 
itself. For example, an e-mail client would have a three pane layout with an 
overlay or extension adding markers internal to the Windows that can be spoken 
for direct control.   in addition, the extensions should be capable of disabling 
single key commands to make the environment more robust in the face of a 
misrecognition change in command to simple text injection.  A classic example of 
this kind of failure is vi.  if your command fails and becomes injected text, 
heaven help you figuring out what the heck happened.  a third feature of 
extensions will be additional speech recognition target only Windows popping up 
to inform or convert data to something that will work with speech recognition.

Some of the features that should be available by this interface is SAPI five 
compliant grammar, microphone control, "heard word", speech driven editing of 
buffer text, to name a few.

Ideally, if all of the local interfaces are identical between different 
recognition engines, then my command UI should just work. I'm sure it will be 
just as portable as CSS. I hope that your reaction, like mine is to think about 
ways of specifying alternate cases for user extensions to accommodate potential 
IE 6 hell scenarios.

I need to think about as little more but I think all of the use cases reduce to:

1. "The vendor failed and I need to fix their bugs in user interface"
2. "The vendor failed and created a user interface that breaks horribly with 
recognition failure'
3. Vendor outfitting or retrofitting application with speech user interface
4. Disabled user or user community retrofitting application with speech user 
interface.

Are any others come to mind?

r19 lets us handle the the extreme use cases of the user (ant) keeping all 
important executables and data local (on a laptop) versus the user (grasshopper) 
keeping nothing in their own control but instead trusting third-party for 
everything (extensions, recognition, etc.). The user can store their extensions 
anyway they want.

r21 Is important and it's missing a lot. The original goal of r21 was to make 
sure that any of the features of the speech recognition environment were always 
available to a disabled or developer. This means the graphical user interface 
for everything has a speech user interface. Any data files or programming 
languages also are something they can be completely spoken and edited by voice.

One important implication is that data files like XML-based data sets described 
in conversations need to have some sort of editing tool/framework which lets you 
edit XML by speech. This is not horribly hard but it looks nothing like the 
"speak the keyboard" solutions proposed by people who don't use speech 
recognition. The solution is some form of a shadow buffer which translates the 
XML to a speakable form and you edit within that shadow buffer which changes the 
XML as you say you're done. No I don't have a working model yet. It's a goal but 
it's taking more resources  than I have available at the moment.

Therefore, I agree with others that unfortunately it is necessary to declare r21 
out of scope even though it means excluding disabled developers from working 
with the tools they need.  the accessibility requirement can be satisfied at a 
later time through dedicated applications,  extensions and user interface models 
similar to ones I've developed as part of my programming by speech effort.  
unfortunately, if you know history, you know this will take forever or longer to 
be done.

Again I apologize for not getting back to people sooner. Let me know what 
additional info you need and I'll try to fill in the blanks this weekend.

Received on Friday, 5 November 2010 05:16:18 UTC