Re: R23. Speech as an input on any application should be able to be optional from Olli Pettay on 2010-11-29 (public-xg-htmlspeech@w3.org from November 2010)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Mon, 29 Nov 2010 22:34:20 +0200
To: chan@info-cast.com
CC: Deborah Dahl <dahl@conversational-technologies.com>, Bjorn Bringert <bringert@google.com>, Dan Burnett <dburnett@voxeo.com>, public-xg-htmlspeech@w3.org
Message-ID: <4CF40E4C.7030509@helsinki.fi>

On 11/26/2010 03:03 AM, chan@info-cast.com wrote:
>
>> On 11/23/2010 04:24 AM, chan@info-cast.com wrote:
>>> Hello Deborah,
>>>
>>> OK, if the speech is optional for that "type=speech" element,
>>> then text (or other modality?) is assumed here ?
>>> Or the element won't get any input other than speech ?
>>>
>>> What we actually need is an element accepting mulitimodal
>>> input, assuming both text and speech agents up and running
>>> for that element simultanesously. Wonder if this use case
>>> had been discussed before - my apology if it's been,
>>> as I started following your standard efforts quite lately.
>>
>>
>> It is not at all clear that we should add any kind of element.
>> What I'm probably going to propose is an JS object, which controls
>> speech recognition and which is the entry point for
>> speech recognition related DOM event stream.
>>
>> Then it is up to the web app (or script libraries) to
>> handle multimodal integration; this way we don't
>> limit multimodal input to speech+keyboard, but also mouse/touch
>> /whatever events can be handled easily.
>>
>> -Olli
>>
>
> Don't have to add a new element to support multimodality elements,
> but expanding the element's type attribute would be enough.
> To add speech, the input tag's type attribute may be expanded:
>
>    <input type="text|speech" ../>

That would be backwards incompatible change, if you want to
support any other type than "text". And note, binding
speech input to one html input element isn't enough
in the common case when user wants to fill several fields.


>
> where users can enter the desired info with either text or speech.
> And the user must be able to use the speech modality in hands-free
> mode, as required by R24. This implies both text and speech channels
> must be active simultaneously to acquire valid input from either of
> the channels.
>
> Implementing and maintaining such dual channel monitoring process
> could be a complex task for average web developers, even when a JS
> library object becomes available.
Average web developer would use a scrip library for multimodal integration.



> Since this process is well-defined
Is it really well-defined? I do assume the process depends
on the web application and the interaction model the
web application provides.


> and necessary for all speech-enabled application, this seems to be more
> of the UA responsibility, than a part of web application. Developers
> engaged in speech apps will expect such function in the speech HTML.
>
> This 'speech' attribute can also be used to support the speech input
> which doesn't have any visible UI counterpart as described in U6.
> When an element's type is specified with the speech attribute
> (i.e. type="speech"), then the UA will take action(s) listed in
> the element upon SS's matching respond for the element, based on
> the expected utterance (using the value attribute) or a grammar
> (using new attribute, grammar) for the element:
>
>    <input type="speech" value="example" prompt="Say example"../>
> or,
>    <input type="speech" grammar="example.grxml" prompt="Say example"../>
>
> Another use case is to support voice navigation within web-pages:
>
>    <input type="speech" value="home page"
> onmatch="goToPage('home.html')"../>
>
> Again, this can be handled by JS objects, but it'd be much more
> effective and productive to present such process at the HTML layer
> along with other ordinary menu based navigation.

I'm not at all sure about this. Speech modality has different behaviors
(for example the fill-multiple-fields-at-once) than the visual/keyboard.


-Olli

Received on Monday, 29 November 2010 20:35:08 UTC