W3C home > Mailing lists > Public > public-speech-api@w3.org > June 2012

RE: Default value of SpeechRecognition.grammars

From: Young, Milan <Milan.Young@nuance.com>
Date: Thu, 21 Jun 2012 16:57:56 +0000
To: Satish S <satish@google.com>
CC: Jerry Carter <jerry@jerrycarter.org>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A475150@SOM-EXCH04.nuance.com>
I like the idea of running multiple recognizers in parallel and I agree it would be a solution to this problem.  As I see it, the requirements are:

*       The application layer is given control over which recognizers are running

*       Each recognizer publishes a single default grammar

*       If the default grammar is not available in its entirety, the recognizer/UA must generate an error.

Can we agree on that?

Thanks


From: Satish S [mailto:satish@google.com]
Sent: Thursday, June 21, 2012 8:22 AM
To: Young, Milan
Cc: Jerry Carter; Hans Wennborg; public-speech-api@w3.org
Subject: Re: Default value of SpeechRecognition.grammars

In the same use case, the device could use a local recognizer when there is no data network and the local recognizer can have smaller set of commands such as 'call Milan' based on the data on the device. It is not the common task of dictation but good enough for the 'how can I help you' application.

Cheers
Satish

On Wed, Jun 20, 2012 at 8:23 PM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
I believe you are confounding *broad* capabilities with *known* capabilities.

Let's say I'm a developer writing a speech application for a mobile device.  I'm on a WiFi network for the development phase and so the default grammar includes dictation, web search, maps, contacts, etc.  I may or may not have an external NL engine for semantic interpretation.  With such a broad range of capabilities, I set the opening prompt to something like "How can I help you?"  Application is tested and deployed.

My user opens my app in a subway tunnel without network service and none of the usual commands work.  The only clue the application layer has to detecting this problem is the number of nomatch events.  This is neither a good user nor developer experience.

Problem easily solved by requiring a default capability for the most common task (seems to be dictation) and generating an error should that capability be unavailable.



From: Jerry Carter [mailto:jerry@jerrycarter.org<mailto:jerry@jerrycarter.org>]
Sent: Wednesday, June 20, 2012 9:49 AM
To: Young, Milan
Cc: Satish S; Hans Wennborg; public-speech-api@w3.org<mailto:public-speech-api@w3.org>

Subject: Re: Default value of SpeechRecognition.grammars

Not true in practice.

I expect that user preferences will favor recognition services with broad capabilities.  These will most likely be dictation-like with very limited semantic processing.

Application authors requiring specific capabilities may then EITHER request a specific recognition service whose default grammar is broad enough for their needs OR provide grammars customized to the task at hand.

In the 1990's and decreasingly in the 2000's, recognition performance typically required customized grammars and dialog constraints to reduce 'out of grammar' responses.  These days it is quite reasonable to expect broad coverage for common words.  Default grammars may be perfectly fine for many simple applications.  The 'Dragon Mobile' engine produced by your company is one such recognizer whose default is probably used far more than customized grammars.

-=- Jerry


On Jun 20, 2012, at 12:33 PM, Young, Milan wrote:

I'm not comfortable with the language for reasons stated in my last email.

In short, writing a good UI would be very difficult if the application didn't know which grammars were active.



From: Satish S [mailto:satish@google.com]<mailto:[mailto:satish@google.com]>
Sent: Wednesday, June 20, 2012 9:31 AM
To: Jerry Carter
Cc: Young, Milan; Hans Wennborg; public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: Re: Default value of SpeechRecognition.grammars

Looks good. Should this go into a non-normative section?

Cheers
Satish

On Wed, Jun 20, 2012 at 5:25 PM, Jerry Carter <jerry@jerrycarter.org<mailto:jerry@jerrycarter.org>> wrote:

The challenge, here, is terminology.  I agree that your scenario is valid and regret that my choice of words is inadequate to express our agreement.  Let me offer this language:

"The recognition service is expected to provide, as a default, a general purpose grammar for common utterances.  The capabilities of this grammar and the domains covered will vary according to the capabilities of the current recognition service.  Application developers who want to ensure coverage for specific utterances are encouraged to specify either a specific recognition service or a specific grammar."

Better?

-=- Jerry


On Jun 20, 2012, at 12:14 PM, Satish S wrote:


Shouldn't that be up to the UA to decide? One use case is if the device did not have access to a recognizer capable of dictation-lite (e.g. recognizer is remote and device has no network access at that moment) the UA can decide to only use a local recognizer capable of recognizing names from the contact list or apps installed and nothing else.

Cheers
Satish

On Wed, Jun 20, 2012 at 5:06 PM, Jerry Carter <jerry@jerrycarter.org<mailto:jerry@jerrycarter.org>> wrote:
I concur that web search is inappropriate, but the specification should provide some expectation as to what the default grammar might be.

If you want the default grammar to be of any general use, it would need to support common words & phrases for the current locality.  It need not be as rich as a dedicated dictation grammar or support utterances as long as for diction tasks (though it could be).  But I would expect a 'dictation-lite'.

-=- Jerry

On Jun 20, 2012, at 11:53 AM, Satish S wrote:


The vast majority of web apps using speech API wouldn't be doing web search with the result so it would be good to not mention it in the spec.

Cheers
Satish

On Wed, Jun 20, 2012 at 4:45 PM, Young, Milan <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
I also support the idea of the engine choosing behavior when no grammars are present.  But it would be nice to put in the spec a few examples of what that default might be.  Dictation and web search seem like good hints.


-----Original Message-----
From: Hans Wennborg [mailto:hwennborg@google.com<mailto:hwennborg@google.com>]
Sent: Wednesday, June 20, 2012 8:27 AM
To: Jerry Carter
Cc: public-speech-api@w3.org<mailto:public-speech-api@w3.org>
Subject: Re: Default value of SpeechRecognition.grammars

On Wed, Jun 20, 2012 at 2:15 PM, Jerry Carter <jerry@jerrycarter.org<mailto:jerry@jerrycarter.org>> wrote:
> Makes sense. I assume you are thinking that the default grammar should be fairly broad, e.g. a dictation grammar.

Yes, but I don't think we should specify what the default grammar should be; it should be decided by the speech recognition engine.

Thanks,
Hans
Received on Thursday, 21 June 2012 16:58:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 21 June 2012 16:58:32 GMT