Re: Notes from today's protocol call from Patrick Ehlen on 2011-06-30 (public-xg-htmlspeech@w3.org from June 2011)

From: Patrick Ehlen <pehlen@attinteractive.com>
Date: Thu, 30 Jun 2011 08:05:41 -0700
To: Robert Brown <Robert.Brown@microsoft.com>
CC: "Young, Milan" <Milan.Young@nuance.com>, HTML Speech XG <public-xg-htmlspeech@w3.org>
Message-ID: <CA6095A5-7A14-4DAD-84BC-0C224D131A80@attinteractive.com>

If we're not trying to adhere too much to MRCP, another scenario we might want to support in the continuous/dictation case is allowing user feedback while recognition is ongoing. This would be similar to Robert's "SET-GRAMMAR-STATE" method, but rather than changing grammars and rules would perform a re-ranking of further output based on user selections from some n-best representation in the UI.

On Jun 29, 2011, at 10:31, "Robert Brown" <Robert.Brown@microsoft.com<mailto:Robert.Brown@microsoft.com>> wrote:


inline...
________________________________
From: Young, Milan [Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>]

Inline…

________________________________
From: Robert Brown [mailto:Robert.Brown@microsoft.com]
Sent: Thursday, June 23, 2011 5:04 PM
To: Young, Milan; HTML Speech XG
Subject: RE: Notes from today's protocol call

One other thing we’ll need to consider is how to add/remove grammars during continuous recognition.

Some use cases:

·         In dictation, it’s not uncommon to have hot words that switch in and out of a command mode (i.e. enable/disable a command grammar).

[Milan] I was figuring this could take place by enabling more than one recognition session.  You have the main dictation session going, and then a parallel channel(s) performing hotword.

[Robert] That could work. But it could also be messy. For example, if you got a match on both sessions, the app would then need to decide which one to act on. The recognizer would be the better resource to make that decision.



·         In open-mic multimodal apps, the app will listen continuously, but change the set of active grammars based on the user’s other non-speech interactions with the app.

[Milan] Yes, I’ve thought about this scenario as well.  The problem is that it’s such a divergence from MRCP2, that it would be hard to retrofit onto existing stacks.  I also haven’t heard much discussion of this feature at the API level, but perhaps I’m wrong.

[Robert] Divergence is okay. Existing stacks weren't designed for these scenarios, but we shouldn't let that invalidate the scenario. The recognizer state machine will need to be more complex than it is in MRCP. Moreover, any existing recognizers that are capable of continuous recognition by definition don't currently use MRCP anyway because it doesn't support the scenario.


[Robert] How about we do something like the following?

1. redefine SET-GRAMMAR so that it can also be used during a recognition, and so that it returns a handle that can be used to refer to the grammar later

2. add a method to set grammar state.  For example, call it "SET-GRAMMAR-STATE", and give it the following capabilities:

    a. enable/disable named top-level rules within a grammar

    b. enable/disable entire grammars

Received on Thursday, 30 June 2011 15:06:23 UTC