Re: R21. Any public interface for creating extensions should be speakable

I agree, R21 is out of scope.


-Olli


On 11/05/2010 08:33 AM, Robert Brown wrote:
> Eric, thanks for fleshing this out.  But I'm still having trouble
> detecting the line between API requirements, examples of user agent
> and/or application capabilities that could be built with the API, and
> design recommendations (feedback UX, text-editing UX, overlays, SAPI5
> grammars, etc).  This still sounds to me like some good examples of
> richer applications that may be built using the API.
>
>
> -----Original Message----- From: Eric S. Johansson
> [mailto:esj@harvee.org] Sent: Thursday, November 04, 2010 10:15 PM
> To: Satish Sampath Cc: Robert Brown; Bjorn Bringert; Dan Burnett;
> public-xg-htmlspeech@w3.org Subject: Re: R21. Any public interface
> for creating extensions should be speakable
>
> On 11/4/2010 6:02 PM, Satish Sampath wrote:
>> A set of use cases which clearly describe what 'end user
>> extensions' mean would help in deciding whether they should be in
>> scope or out of scope of our work.
>>
>> I believe Eric Johansson suggested these requirements initially.
>> Eric, can you help us with a few use cases for these requirements?
>
> sorry about my lack of participation. I was eaten alive by a bunch of
> things but unfortunately not any work involving speech user
> interfaces.
>
> As a general rule, I argue that user extensions that affect or
> control part of or a complete speech user interface are part of a
> speech recognition environment. with that outlook, it's an "obvious"
> conclusion to say that it is part of our Scope of work.
>
> The best use case I can think of is the overlay of a speech user
> interface on top of a cloud application. In addition to classic
> command-and-control grammar and actions, there should be other
> Windows displaying or accepting results recognition process and
> translating it to/from what is in the application itself. For
> example, an e-mail client would have a three pane layout with an
> overlay or extension adding markers internal to the Windows that can
> be spoken for direct control.   in addition, the extensions should be
> capable of disabling single key commands to make the environment more
> robust in the face of a misrecognition change in command to simple
> text injection.  A classic example of this kind of failure is vi.  if
> your command fails and becomes injected text, heaven help you
> figuring out what the heck happened.  a third feature of extensions
> will be additional speech recognition target only Windows popping up
> to inform or convert data to something that will work with speech
> recognition.
>
> Some of the features that should be available by this interface is
> SAPI five compliant grammar, microphone control, "heard word", speech
> driven editing of buffer text, to name a few.
>
> Ideally, if all of the local interfaces are identical between
> different recognition engines, then my command UI should just work.
> I'm sure it will be just as portable as CSS. I hope that your
> reaction, like mine is to think about ways of specifying alternate
> cases for user extensions to accommodate potential IE 6 hell
> scenarios.
>
> I need to think about as little more but I think all of the use cases
> reduce to:
>
> 1. "The vendor failed and I need to fix their bugs in user
> interface" 2. "The vendor failed and created a user interface that
> breaks horribly with recognition failure' 3. Vendor outfitting or
> retrofitting application with speech user interface 4. Disabled user
> or user community retrofitting application with speech user
> interface.
>
> Are any others come to mind?
>
> r19 lets us handle the the extreme use cases of the user (ant)
> keeping all important executables and data local (on a laptop) versus
> the user (grasshopper) keeping nothing in their own control but
> instead trusting third-party for everything (extensions, recognition,
> etc.). The user can store their extensions anyway they want.
>
> r21 Is important and it's missing a lot. The original goal of r21 was
> to make sure that any of the features of the speech recognition
> environment were always available to a disabled or developer. This
> means the graphical user interface for everything has a speech user
> interface. Any data files or programming languages also are something
> they can be completely spoken and edited by voice.
>
> One important implication is that data files like XML-based data sets
> described in conversations need to have some sort of editing
> tool/framework which lets you edit XML by speech. This is not
> horribly hard but it looks nothing like the "speak the keyboard"
> solutions proposed by people who don't use speech recognition. The
> solution is some form of a shadow buffer which translates the XML to
> a speakable form and you edit within that shadow buffer which changes
> the XML as you say you're done. No I don't have a working model yet.
> It's a goal but it's taking more resources  than I have available at
> the moment.
>
> Therefore, I agree with others that unfortunately it is necessary to
> declare r21 out of scope even though it means excluding disabled
> developers from working with the tools they need.  the accessibility
> requirement can be satisfied at a later time through dedicated
> applications,  extensions and user interface models similar to ones
> I've developed as part of my programming by speech effort.
> unfortunately, if you know history, you know this will take forever
> or longer to be done.
>
> Again I apologize for not getting back to people sooner. Let me know
> what additional info you need and I'll try to fill in the blanks this
> weekend.
>
>
>

Received on Friday, 5 November 2010 14:32:18 UTC