Re: Default value of SpeechRecognition.grammars from Satish S on 2012-06-21 (public-speech-api@w3.org from June 2012)

From: Satish S <satish@google.com>
Date: Thu, 21 Jun 2012 16:22:11 +0100
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Jerry Carter <jerry@jerrycarter.org>, Hans Wennborg <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <CAHZf7R=Pv6=NV-NgqiJQVXxaEADWMYeKhKaC8vns354Y=g=-3w@mail.gmail.com>
In the same use case, the device could use a local recognizer when there is
no data network and the local recognizer can have smaller set of commands
such as 'call Milan' based on the data on the device. It is not the common
task of dictation but good enough for the 'how can I help you' application.

Cheers
Satish


On Wed, Jun 20, 2012 at 8:23 PM, Young, Milan <Milan.Young@nuance.com>wrote:

>  I believe you are confounding **broad** capabilities with **known**
> capabilities.****
>
> ** **
>
> Let’s say I’m a developer writing a speech application for a mobile
> device.  I’m on a WiFi network for the development phase and so the default
> grammar includes dictation, web search, maps, contacts, etc.  I may or may
> not have an external NL engine for semantic interpretation.  With such a
> broad range of capabilities, I set the opening prompt to something like
> “How can I help you?”  Application is tested and deployed.****
>
> ** **
>
> My user opens my app in a subway tunnel without network service and none
> of the usual commands work.  The only clue the application layer has to
> detecting this problem is the number of nomatch events.  This is neither a
> good user nor developer experience.****
>
> ** **
>
> Problem easily solved by requiring a default capability for the most
> common task (seems to be dictation) and generating an error should that
> capability be unavailable.****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Jerry Carter [mailto:jerry@jerrycarter.org]
> *Sent:* Wednesday, June 20, 2012 9:49 AM
> *To:* Young, Milan
> *Cc:* Satish S; Hans Wennborg; public-speech-api@w3.org
>
> *Subject:* Re: Default value of SpeechRecognition.grammars****
>
>  ** **
>
> Not true in practice.****
>
> ** **
>
> I expect that user preferences will favor recognition services with broad
> capabilities.  These will most likely be dictation-like with very limited
> semantic processing.****
>
> ** **
>
> Application authors requiring specific capabilities may then EITHER
> request a specific recognition service whose default grammar is broad
> enough for their needs OR provide grammars customized to the task at hand.
> ****
>
> ** **
>
> In the 1990's and decreasingly in the 2000's, recognition performance
> typically required customized grammars and dialog constraints to reduce
> 'out of grammar' responses.  These days it is quite reasonable to expect
> broad coverage for common words.  Default grammars may be perfectly fine
> for many simple applications.  The 'Dragon Mobile' engine produced by your
> company is one such recognizer whose default is probably used far more than
> customized grammars.****
>
> ** **
>
> -=- Jerry****
>
> ** **
>
> ** **
>
> On Jun 20, 2012, at 12:33 PM, Young, Milan wrote:****
>
>
>
> ****
>
> I’m not comfortable with the language for reasons stated in my last email.
> ****
>
>  ****
>
> In short, writing a good UI would be very difficult if the application
> didn’t know which grammars were active.****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Satish S [mailto:satish@google.com]
> *Sent:* Wednesday, June 20, 2012 9:31 AM
> *To:* Jerry Carter
> *Cc:* Young, Milan; Hans Wennborg; public-speech-api@w3.org
> *Subject:* Re: Default value of SpeechRecognition.grammars****
>
>  ****
>
> Looks good. Should this go into a non-normative section?****
>
>
> Cheers
> Satish
>
>
> ****
>
> On Wed, Jun 20, 2012 at 5:25 PM, Jerry Carter <jerry@jerrycarter.org>
> wrote:****
>
>  ****
>
> The challenge, here, is terminology.  I agree that your scenario is valid
> and regret that my choice of words is inadequate to express our agreement.
>  Let me offer this language:****
>
>  ****
>
> "The recognition service is expected to provide, as a default, a general
> purpose grammar for common utterances.  The capabilities of this grammar
> and the domains covered will vary according to the capabilities of the
> current recognition service.  Application developers who want to ensure
> coverage for specific utterances are encouraged to specify either a
> specific recognition service or a specific grammar."****
>
>  ****
>
> Better?****
>
>  ****
>
> -=- Jerry****
>
>  ****
>
>  ****
>
> On Jun 20, 2012, at 12:14 PM, Satish S wrote:****
>
>
>
>
> ****
>
> Shouldn't that be up to the UA to decide? One use case is if the device
> did not have access to a recognizer capable of dictation-lite (e.g.
> recognizer is remote and device has no network access at that moment) the
> UA can decide to only use a local recognizer capable of recognizing names
> from the contact list or apps installed and nothing else.****
>
>
> Cheers
> Satish
>
>
> ****
>
> On Wed, Jun 20, 2012 at 5:06 PM, Jerry Carter <jerry@jerrycarter.org>
> wrote:****
>
> I concur that web search is inappropriate, but the specification should
> provide some expectation as to what the default grammar might be.****
>
>  ****
>
> If you want the default grammar to be of any general use, it would need
> to support common words & phrases for the current locality.  It need not be
> as rich as a dedicated dictation grammar or support utterances as long as
> for diction tasks (though it could be).  But I would expect a
> 'dictation-lite'.****
>
>  ****
>
> -=- Jerry****
>
>  ****
>
> On Jun 20, 2012, at 11:53 AM, Satish S wrote:****
>
>
>
>
> ****
>
> The vast majority of web apps using speech API wouldn't be doing web
> search with the result so it would be good to not mention it in the spec.*
> ***
>
>
> Cheers
> Satish
>
>
> ****
>
> On Wed, Jun 20, 2012 at 4:45 PM, Young, Milan <Milan.Young@nuance.com>
> wrote:****
>
> I also support the idea of the engine choosing behavior when no grammars
> are present.  But it would be nice to put in the spec a few examples of
> what that default might be.  Dictation and web search seem like good hints.
> ****
>
>
>
> -----Original Message-----
> From: Hans Wennborg [mailto:hwennborg@google.com]
> Sent: Wednesday, June 20, 2012 8:27 AM
> To: Jerry Carter
> Cc: public-speech-api@w3.org
> Subject: Re: Default value of SpeechRecognition.grammars
>
> On Wed, Jun 20, 2012 at 2:15 PM, Jerry Carter <jerry@jerrycarter.org>
> wrote:
> > Makes sense. I assume you are thinking that the default grammar should
> be fairly broad, e.g. a dictation grammar.
>
> Yes, but I don't think we should specify what the default grammar should
> be; it should be decided by the speech recognition engine.
>
> Thanks,
> Hans
>
>
> ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>
Received on Thursday, 21 June 2012 15:22:49 UTC