Re: Speech input API proposal (from Google) from Olli Pettay on 2011-03-07 (public-xg-htmlspeech@w3.org from March 2011)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Mon, 07 Mar 2011 18:58:18 +0200
To: Satish Sampath <satish@google.com>
CC: public-xg-htmlspeech@w3.org
Message-ID: <4D750EAA.3080401@helsinki.fi>

On 02/28/2011 07:14 PM, Satish Sampath wrote:
> Hi all,
>
> I have attached the latest draft of our speech input API proposal with
> this mail. Please take a look and share your thoughts.
>
> The main additions since our previous proposal include methods for
> starting speech input from script, additional error codes and additional
> events fired at various stages of the session as brought up in the TPAC
> face-to-face discussions.
>
> Cheers
> Satish

Some comments.

Nits, there are some mistakes in the examples, like:
- onspeechchange="doCommand" wouldn't work.
   It should be onspeechchange="doComment(event)"
   This occurs in many places.
- Event listener attributes should be just
   attribute Function onfoo, not
   attribute Function onfoo()
- event.target.results is used in several places, yet .result is not
   defined in elements. Should be event.results.
- Speech shell example doesn't quite work.
   It takes the *string* value from event.target, yet that
   string magically gets .action property.
   Instead of using event.target.value the example should
   probably use event.interpretation

And then some real comments...

Chapter 4 "User click on a visible speech input element which has an 
obvious graphical representation showing that it will start speech input."
The "obvious graphical representation" can be hidden by having
another element with higher z-index or setting opacity: 0 or
setting the size to 1x1 or so.
UA must always show some kind of notification, which can't be
faked or hidden by the web page.

Why is speech input bound to some seemingly random elements? What is
the criteria you have picked text, search, url, telephone, email and 
Password?
Why not datetime, time, month, week, number etc. ?

How would filling multiple fields at once work?
Some input element is used to start the recognition. What value does
that element get and when? Based on the document I'd assume the
full utterance, and then in the speechchange handler it could be set
to some other value. But that would cause flickering -
the value would be 'utterance' for a very short period and then
something else.
This is a reason why I don't want speech recognition to be bound
automatically to an input field. Based on my experience speech UI
is especially useful when user can fill multiple fields at once.

5.6 using pattern attribute.
How is that supposed to work? Should the recognizer try to use it to
recognize words, or letter by letter or how?

5.7 Having different kind of result processing in different elements is
strange.

-Olli

Received on Monday, 7 March 2011 16:58:55 UTC