- From: Patrick Ehlen <pehlen@attinteractive.com>
- Date: Thu, 23 Jun 2011 08:07:47 -0700
- To: Michael Johnston <johnston@research.att.com>, "Young, Milan" <Milan.Young@nuance.com>, Robert Brown <Robert.Brown@microsoft.com>, HTML Speech XG <public-xg-htmlspeech@w3.org>
If we are to use RECOGNIZE for calling these other types of resources, maybe DEFINE-GRAMMAR should be generalized to SET-MODEL or SPECIFY-MODEL? DEFINE-GRAMMAR sounds too SRGS-specific to me. On 6/23/11 7:20 AM, "Michael Johnston" <johnston@research.att.com> wrote: > Following up on the issue of allowing a broader set of use cases > to be handled using the emerging control protocol (tasks other than straight > up speech recognition e.g. verification, prosody recognition, emotion > recognition, all of which can handled by shipping audio to a speech > resource and getting an EMMA result back). > > The action item was to look into handling these using the recognizer > resource and RECOGNIZE method. I don't seen any immediate problems > assuming we are happy with using DEFINE-GRAMMAR to specify > not just grammars but arbitrary models to be used to derive some kind of > interpretation of the input. We are going to need to use DEFINE-GRAMMAR to > specify both SRGS and SLMs already so it could also be used to point to > arbitrary > models that conduct other kinds of processing. Thinking through beyond > EMMA to the JS result API, probably the result of this processing > should show up in the 'interpretation' field. > > Michael > ________________________________________ > From: public-xg-htmlspeech-request@w3.org > [public-xg-htmlspeech-request@w3.org] On Behalf Of Young, Milan > [Milan.Young@nuance.com] > Sent: Thursday, June 23, 2011 1:17 AM > To: Robert Brown; HTML Speech XG > Subject: RE: Notes from today's protocol call > > For my part, Iıve updated the control portion of the protocol to cover the > continuous speech scenario. Also made a few modifications based on recent > discussions and updated the document to match Robertıs HTML format. Please > see the attached. > > There are a couple areas of the protocol that are still TBD in my mind, but > rather than let the perfect become the enemy of the good, I figured Iıd open > this up for discussion. > > > > ________________________________ > From: public-xg-htmlspeech-request@w3.org > [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Robert Brown > Sent: Wednesday, June 22, 2011 5:24 PM > To: HTML Speech XG > Subject: RE: Notes from today's protocol call > > I havenıt finished my work item (Redraft to incorporate everything weıve > discussed so far) but itıs in progress. > > Here are some things Iıve noticed so far: > > > 1. Is message-length in the request line really necessary? Presumably > its only value in MRCP is to provide message framing in what is otherwise just > an open-ended character stream, which we get automatically in WebSockets. > Ditto for the Content-Length header. > > 2. It's not clear that we need Cancel-If-Queue for recognition. HTML > apps won't have the same serialized dialog we see in IVR, so this may not be a > meaningful header. > > 3. Will the API have hotword functionality? If not, do we need the > hotword headers? > > 4. Does the reco portion of the protocol imply API requirements that > havenıt been discussed yet? For example, START-INPUT-TIMERS is there for a > good reason, but AFAIK the XG hasnıt spoken about the scenario . Similarly, > Early-No-Match seems useful. Is it? > > 5. The TTS design has some IVR artifacts that don't make sense in HTML. > In IVR, the synthesizer essentially renders directly to the user's telephone, > and is an active part of the user interface. Whereas in HTML, the synthesizer > is just a provider of audio to the UA. The UA buffer the audio and control > playback independent of rendering. In light of this, the CONTROL method, > jump-size header, Speak-Length header, kill-on-barge-in header (and possibly > others) don't really make sense. > > 6. The TTS Speaker-Profile header will probably never be used, because > HTML UAs will want to pass values inline, rather than store them in a separate > URI-referenceable resource. Should we remove it? > > 7. DEFINE-LEXICON and the Load-Lexicon header appear to be useful. Does > it need to surface in the API, or is its presence in SSML enough? And if it > is, why do we need the header? And also, why isn't there corresponding > functionality for recognition? > > > > From: public-xg-htmlspeech-request@w3.org > [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Robert Brown > Sent: Thursday, June 16, 2011 10:06 AM > To: HTML Speech XG > Subject: Notes from today's protocol call > > Attendees: > > - Robert Brown > > - Milan Young > > - Patrick Ehlen > > - Michael Johnston > > Topic: control portion of protocol based on MRCP subset (this thread > http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0035.html) > > > - Agreed that this subset is appropriate for ASR & TTS. > > - Unclear whether recording should be included. Agreed to escalate > this to the XG. IF thereıs agreement that recording scenarios are common and > valuable, weıll include that portion in the protocol. OTHERWISE weıll omit > it, since itıs still possible through more convoluted means. > > o We discussed this in the main call with the XG. General agreement > was that recording isnıt something we need to solve, and that it should be > possible as a side-effect of recognition (i.e. <ruleref special="GARBAGE"> and > retain the audio). > > - While services are free to implement a subset of the protocol (e.g. > only SR or only TTS portions), clients will need to implement the full set. > > Topic: RTP > > - Agreed that it is unneeded, for the reasons stated > http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0029.html > > - The basic design approach provides an extensibility mechanism so > that if new scenarios emerged in the future that required RTP or another > protocol, > (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/att-0008/spe > ech-protocol-basic-approach-01.html) > > Topic: SDP > > - Agreed, like RTP, that it is unneeded given the context we already > have as a byproduct of our design approach. (See also the last paragraph > here: > http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0042.html) > > Topic: session initiation & media negotiation > (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/0042.html) > > - People havenıt had a chance to review this. Will discuss more over > the coming week. > > - GET-PARAMS is resource specific, so works a little differently to > whatıs written here. (Robert will need to re-think and make another proposal) > > Next steps: > > - Continuous speech proposal (Milan) > > - Redraft to incorporate everything weıve discussed so far (Robert) > > - Examine whether the recognition portion of the protocol can handle > extended scenarios, like verification, etc. (Michael) > > >
Received on Thursday, 23 June 2011 15:08:20 UTC