RE: R21. Any public interface for creating extensions should be speakable from Michael Bodell on 2010-11-11 (public-xg-htmlspeech@w3.org from November 2010)

From: Michael Bodell <mbodell@microsoft.com>
Date: Thu, 11 Nov 2010 08:00:38 +0000
To: "Olli@pettay.fi" <Olli@pettay.fi>, Robert Brown <Robert.Brown@microsoft.com>
CC: "Eric S. Johansson" <esj@harvee.org>, Satish Sampath <satish@google.com>, Bjorn Bringert <bringert@google.com>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <22CD592CCD76414085591204EB19F4E8099FFC88@TK5EX14MBXC206.redmond.corp.microsoft.>
It sounds like there was mostly general agreement that R21 was out of scope and should be removed with no additional requirement to replace it.

On the 11/11 call I'd like us to be clear that this is what we agree to.

-----Original Message-----
From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Olli Pettay
Sent: Friday, November 05, 2010 7:32 AM
To: Robert Brown
Cc: Eric S. Johansson; Satish Sampath; Bjorn Bringert; Dan Burnett; public-xg-htmlspeech@w3.org
Subject: Re: R21. Any public interface for creating extensions should be speakable

I agree, R21 is out of scope.


-Olli


On 11/05/2010 08:33 AM, Robert Brown wrote:
> Eric, thanks for fleshing this out.  But I'm still having trouble 
> detecting the line between API requirements, examples of user agent 
> and/or application capabilities that could be built with the API, and 
> design recommendations (feedback UX, text-editing UX, overlays, SAPI5 
> grammars, etc).  This still sounds to me like some good examples of 
> richer applications that may be built using the API.
>
>
> -----Original Message----- From: Eric S. Johansson 
> [mailto:esj@harvee.org] Sent: Thursday, November 04, 2010 10:15 PM
> To: Satish Sampath Cc: Robert Brown; Bjorn Bringert; Dan Burnett; 
> public-xg-htmlspeech@w3.org Subject: Re: R21. Any public interface for 
> creating extensions should be speakable
>
> On 11/4/2010 6:02 PM, Satish Sampath wrote:
>> A set of use cases which clearly describe what 'end user extensions' 
>> mean would help in deciding whether they should be in scope or out of 
>> scope of our work.
>>
>> I believe Eric Johansson suggested these requirements initially.
>> Eric, can you help us with a few use cases for these requirements?
>
> sorry about my lack of participation. I was eaten alive by a bunch of 
> things but unfortunately not any work involving speech user 
> interfaces.
>
> As a general rule, I argue that user extensions that affect or control 
> part of or a complete speech user interface are part of a speech 
> recognition environment. with that outlook, it's an "obvious"
> conclusion to say that it is part of our Scope of work.
>
> The best use case I can think of is the overlay of a speech user 
> interface on top of a cloud application. In addition to classic 
> command-and-control grammar and actions, there should be other Windows 
> displaying or accepting results recognition process and translating it 
> to/from what is in the application itself. For example, an e-mail 
> client would have a three pane layout with an overlay or extension 
> adding markers internal to the Windows that can
> be spoken for direct control.   in addition, the extensions should be
> capable of disabling single key commands to make the environment more 
> robust in the face of a misrecognition change in command to simple 
> text injection.  A classic example of this kind of failure is vi.  if 
> your command fails and becomes injected text, heaven help you figuring 
> out what the heck happened.  a third feature of extensions will be 
> additional speech recognition target only Windows popping up to inform 
> or convert data to something that will work with speech recognition.
>
> Some of the features that should be available by this interface is 
> SAPI five compliant grammar, microphone control, "heard word", speech 
> driven editing of buffer text, to name a few.
>
> Ideally, if all of the local interfaces are identical between 
> different recognition engines, then my command UI should just work.
> I'm sure it will be just as portable as CSS. I hope that your 
> reaction, like mine is to think about ways of specifying alternate 
> cases for user extensions to accommodate potential IE 6 hell 
> scenarios.
>
> I need to think about as little more but I think all of the use cases 
> reduce to:
>
> 1. "The vendor failed and I need to fix their bugs in user interface" 
> 2. "The vendor failed and created a user interface that breaks 
> horribly with recognition failure' 3. Vendor outfitting or 
> retrofitting application with speech user interface 4. Disabled user 
> or user community retrofitting application with speech user interface.
>
> Are any others come to mind?
>
> r19 lets us handle the the extreme use cases of the user (ant) keeping 
> all important executables and data local (on a laptop) versus the user 
> (grasshopper) keeping nothing in their own control but instead 
> trusting third-party for everything (extensions, recognition, etc.). 
> The user can store their extensions anyway they want.
>
> r21 Is important and it's missing a lot. The original goal of r21 was 
> to make sure that any of the features of the speech recognition 
> environment were always available to a disabled or developer. This 
> means the graphical user interface for everything has a speech user 
> interface. Any data files or programming languages also are something 
> they can be completely spoken and edited by voice.
>
> One important implication is that data files like XML-based data sets 
> described in conversations need to have some sort of editing 
> tool/framework which lets you edit XML by speech. This is not horribly 
> hard but it looks nothing like the "speak the keyboard"
> solutions proposed by people who don't use speech recognition. The 
> solution is some form of a shadow buffer which translates the XML to a 
> speakable form and you edit within that shadow buffer which changes 
> the XML as you say you're done. No I don't have a working model yet.
> It's a goal but it's taking more resources  than I have available at 
> the moment.
>
> Therefore, I agree with others that unfortunately it is necessary to 
> declare r21 out of scope even though it means excluding disabled 
> developers from working with the tools they need.  the accessibility 
> requirement can be satisfied at a later time through dedicated 
> applications,  extensions and user interface models similar to ones 
> I've developed as part of my programming by speech effort.
> unfortunately, if you know history, you know this will take forever or 
> longer to be done.
>
> Again I apologize for not getting back to people sooner. Let me know 
> what additional info you need and I'll try to fill in the blanks this 
> weekend.
>
>
>
Received on Thursday, 11 November 2010 08:01:12 UTC