- From: <chan@info-cast.com>
- Date: Tue, 23 Nov 2010 14:18:05 -0700
- To: Bjorn Bringert <bringert@google.com>
- Cc: Deborah Dahl <dahl@conversational-technologies.com>, Dan Burnett <dburnett@voxeo.com>, <public-xg-htmlspeech@w3.org>
We had developed an IVR to provide voice interface to email and calendar info, supporting telephone handsets connected over SIP and RTP to our speech server. Currently, we are adding text/mouse modality to our app, to build a multimodal Android application based on our IVR system written in VXML and Java. All of the speech dialog in our app now being replicated on the Android screen, where the user can enter commands and data using either the voice or text/mouse. For enhanced accessibility and usability, our Android front-end doesn't require the user to choose the modality before entering commands/data - users just speak up or type/select the info. Since our voice server continuously monitors the RTP voice channel (from Android) to recognize, our backend system just needs to choose the info from either of the two input channels - whichever comes up first. In order to add a new schedule, for instance, the user can either speak the schedule summary or type it into the screen. If the utterance is recognized, then the result is shown on the screen as if it's typed-in. The user may type-in the new schedule info if speech cannot be used or when the utterance is not recognized. Chan > Hi Chan, > > Could you describe that use case in a bit more detail please? > > /Bjorn > > On Tue, Nov 23, 2010 at 2:24 AM, <chan@info-cast.com> wrote: >> Hello Deborah, >> >> OK, if the speech is optional for that "type=speech" element, >> then text (or other modality?) is assumed here ? >> Or the element won't get any input other than speech ? >> >> What we actually need is an element accepting mulitimodal >> input, assuming both text and speech agents up and running >> for that element simultanesously. Wonder if this use case >> had been discussed before - my apology if it's been, >> as I started following your standard efforts quite lately. >> >> Regards, >> >> Chan Lee >> >>> When I suggested this requirement, I was thinking about if the eventual >>> proposal supports some kind of an attribute on "<input>" like >>> "type=speech", that attribute should be interpreted as allowing speech, >> not >>> requiring it. If it meant that speech was required, then the application >>> would force people to speak to use it. >>> >>>> -----Original Message----- >>>> From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech- >>>> request@w3.org] On Behalf Of Bjorn Bringert >>>> Sent: Thursday, November 18, 2010 1:37 PM >>>> To: Dan Burnett >>>> Cc: public-xg-htmlspeech@w3.org >>>> Subject: Re: R23. Speech as an input on any application should be able >>>> to be >>>> optional >>>> >>>> This doesn't seem like a requirement on the API, but a requirement on >>>> how application authors use it. >>>> >>>> /Bjorn >>>> >>>> On Thu, Nov 18, 2010 at 5:37 PM, Dan Burnett <dburnett@voxeo.com> >>>> wrote: >>>> > Group, >>>> > >>>> > This is the next of the requirements to discuss and prioritize based >>>> > on our >>>> > ranking approach [1]. >>>> > >>>> > This email is the beginning of a thread for questions, discussion, >> and >>>> > opinions regarding our first draft of Requirement 23 [2]. >>>> > >>>> > Please discuss via email as we agreed at the Lyon f2f meeting. >>>> > Outstanding >>>> > points of contention will be discussed live at an upcoming >>>> > teleconference. >>>> > >>>> > -- dan >>>> > >>>> > [1] http://lists.w3.org/Archives/Public/public-xg- >>>> htmlspeech/2010Oct/0024.html >>>> > [2] >>>> > http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Oct/att- >>>> 0001/speech.html#r23 >>> >>>> >>>> >>>> >>>> -- >>>> Bjorn Bringert >>>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham >>>> Palace Road, London, SW1W 9TQ >>>> Registered in England Number: 3977902 >>
Received on Tuesday, 23 November 2010 21:18:39 UTC