- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Fri, 05 Nov 2010 16:31:33 +0200
- To: Robert Brown <Robert.Brown@microsoft.com>
- CC: "Eric S. Johansson" <esj@harvee.org>, Satish Sampath <satish@google.com>, Bjorn Bringert <bringert@google.com>, Dan Burnett <dburnett@voxeo.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
I agree, R21 is out of scope. -Olli On 11/05/2010 08:33 AM, Robert Brown wrote: > Eric, thanks for fleshing this out. But I'm still having trouble > detecting the line between API requirements, examples of user agent > and/or application capabilities that could be built with the API, and > design recommendations (feedback UX, text-editing UX, overlays, SAPI5 > grammars, etc). This still sounds to me like some good examples of > richer applications that may be built using the API. > > > -----Original Message----- From: Eric S. Johansson > [mailto:esj@harvee.org] Sent: Thursday, November 04, 2010 10:15 PM > To: Satish Sampath Cc: Robert Brown; Bjorn Bringert; Dan Burnett; > public-xg-htmlspeech@w3.org Subject: Re: R21. Any public interface > for creating extensions should be speakable > > On 11/4/2010 6:02 PM, Satish Sampath wrote: >> A set of use cases which clearly describe what 'end user >> extensions' mean would help in deciding whether they should be in >> scope or out of scope of our work. >> >> I believe Eric Johansson suggested these requirements initially. >> Eric, can you help us with a few use cases for these requirements? > > sorry about my lack of participation. I was eaten alive by a bunch of > things but unfortunately not any work involving speech user > interfaces. > > As a general rule, I argue that user extensions that affect or > control part of or a complete speech user interface are part of a > speech recognition environment. with that outlook, it's an "obvious" > conclusion to say that it is part of our Scope of work. > > The best use case I can think of is the overlay of a speech user > interface on top of a cloud application. In addition to classic > command-and-control grammar and actions, there should be other > Windows displaying or accepting results recognition process and > translating it to/from what is in the application itself. For > example, an e-mail client would have a three pane layout with an > overlay or extension adding markers internal to the Windows that can > be spoken for direct control. in addition, the extensions should be > capable of disabling single key commands to make the environment more > robust in the face of a misrecognition change in command to simple > text injection. A classic example of this kind of failure is vi. if > your command fails and becomes injected text, heaven help you > figuring out what the heck happened. a third feature of extensions > will be additional speech recognition target only Windows popping up > to inform or convert data to something that will work with speech > recognition. > > Some of the features that should be available by this interface is > SAPI five compliant grammar, microphone control, "heard word", speech > driven editing of buffer text, to name a few. > > Ideally, if all of the local interfaces are identical between > different recognition engines, then my command UI should just work. > I'm sure it will be just as portable as CSS. I hope that your > reaction, like mine is to think about ways of specifying alternate > cases for user extensions to accommodate potential IE 6 hell > scenarios. > > I need to think about as little more but I think all of the use cases > reduce to: > > 1. "The vendor failed and I need to fix their bugs in user > interface" 2. "The vendor failed and created a user interface that > breaks horribly with recognition failure' 3. Vendor outfitting or > retrofitting application with speech user interface 4. Disabled user > or user community retrofitting application with speech user > interface. > > Are any others come to mind? > > r19 lets us handle the the extreme use cases of the user (ant) > keeping all important executables and data local (on a laptop) versus > the user (grasshopper) keeping nothing in their own control but > instead trusting third-party for everything (extensions, recognition, > etc.). The user can store their extensions anyway they want. > > r21 Is important and it's missing a lot. The original goal of r21 was > to make sure that any of the features of the speech recognition > environment were always available to a disabled or developer. This > means the graphical user interface for everything has a speech user > interface. Any data files or programming languages also are something > they can be completely spoken and edited by voice. > > One important implication is that data files like XML-based data sets > described in conversations need to have some sort of editing > tool/framework which lets you edit XML by speech. This is not > horribly hard but it looks nothing like the "speak the keyboard" > solutions proposed by people who don't use speech recognition. The > solution is some form of a shadow buffer which translates the XML to > a speakable form and you edit within that shadow buffer which changes > the XML as you say you're done. No I don't have a working model yet. > It's a goal but it's taking more resources than I have available at > the moment. > > Therefore, I agree with others that unfortunately it is necessary to > declare r21 out of scope even though it means excluding disabled > developers from working with the tools they need. the accessibility > requirement can be satisfied at a later time through dedicated > applications, extensions and user interface models similar to ones > I've developed as part of my programming by speech effort. > unfortunately, if you know history, you know this will take forever > or longer to be done. > > Again I apologize for not getting back to people sooner. Let me know > what additional info you need and I'll try to fill in the blanks this > weekend. > > >
Received on Friday, 5 November 2010 14:32:18 UTC