W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > November 2010

Re: R23. Speech as an input on any application should be able to be optional

From: <chan@info-cast.com>
Date: Thu, 25 Nov 2010 18:03:51 -0700
To: <Olli@pettay.fi>
Cc: Deborah Dahl <dahl@conversational-technologies.com>, Bjorn Bringert <bringert@google.com>, Dan Burnett <dburnett@voxeo.com>, <public-xg-htmlspeech@w3.org>
Message-ID: <d73410d7fe0d500667a3adbf4527f320@info-cast.com>

> On 11/23/2010 04:24 AM, chan@info-cast.com wrote:
>> Hello Deborah,
>> OK, if the speech is optional for that "type=speech" element,
>> then text (or other modality?) is assumed here ?
>> Or the element won't get any input other than speech ?
>> What we actually need is an element accepting mulitimodal
>> input, assuming both text and speech agents up and running
>> for that element simultanesously. Wonder if this use case
>> had been discussed before - my apology if it's been,
>> as I started following your standard efforts quite lately.
> It is not at all clear that we should add any kind of element.
> What I'm probably going to propose is an JS object, which controls
> speech recognition and which is the entry point for
> speech recognition related DOM event stream.
> Then it is up to the web app (or script libraries) to
> handle multimodal integration; this way we don't
> limit multimodal input to speech+keyboard, but also mouse/touch
> /whatever events can be handled easily.
> -Olli

Don't have to add a new element to support multimodality elements,
but expanding the element's type attribute would be enough.
To add speech, the input tag's type attribute may be expanded:

  <input type="text|speech" ../>

where users can enter the desired info with either text or speech.
And the user must be able to use the speech modality in hands-free
mode, as required by R24. This implies both text and speech channels
must be active simultaneously to acquire valid input from either of
the channels.

Implementing and maintaining such dual channel monitoring process
could be a complex task for average web developers, even when a JS
library object becomes available. Since this process is well-defined
and necessary for all speech-enabled application, this seems to be more
of the UA responsibility, than a part of web application. Developers
engaged in speech apps will expect such function in the speech HTML.

This 'speech' attribute can also be used to support the speech input
which doesn't have any visible UI counterpart as described in U6.
When an element's type is specified with the speech attribute
(i.e. type="speech"), then the UA will take action(s) listed in
the element upon SS's matching respond for the element, based on
the expected utterance (using the value attribute) or a grammar
(using new attribute, grammar) for the element:

  <input type="speech" value="example" prompt="Say example"../>
  <input type="speech" grammar="example.grxml" prompt="Say example"../>

Another use case is to support voice navigation within web-pages:

  <input type="speech" value="home page"

Again, this can be handled by JS objects, but it'd be much more
effective and productive to present such process at the HTML layer
along with other ordinary menu based navigation.

Received on Friday, 26 November 2010 01:04:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:48 UTC