- From: Bjorn Bringert <bringert@google.com>
- Date: Mon, 23 May 2011 16:05:59 +0100
- To: "Young, Milan" <Milan.Young@nuance.com>
- Cc: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On Mon, May 23, 2011 at 3:50 PM, Young, Milan <Milan.Young@nuance.com> wrote: > You say below that the speech service "divides the audio chunks". This > could be interpreted that the SS could apply a portion of a chunk in one > result, and the remainder applied to the next. May want to discuss this > further once we better understand the streaming model. I meant that each Result comes from a single audio chunk. But that's really a service implementation detail I guess. > I also thought we agreed the web-app could send continuous correction > feedback following the same model as feedback in the form-filling case. > The main difference to consider is that in the continuous case, feedback > could trigger the SS sending replacement results. Yeah, I forgot to write about that. The Result object could contain a feedback method for telling the speech recognition service when the user corrects a result. The SS could then send a replace event for some other Result if it wants. > I suggest that we come up with a good use case before specing > intermediate results. Even marking them as optional will consume time > to: 1) spec, 2) implement in UA, and 3) handle conformance. Yeah, I'm fine with omitting intermediate events for now. I just wanted to capture how we could add them if we want. > -----Original Message----- > From: public-xg-htmlspeech-request@w3.org > [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn Bringert > Sent: Monday, May 23, 2011 7:29 AM > To: public-xg-htmlspeech@w3.org > Subject: Continuous recognition API > > This is a summary of the continuous recognition API proposed in the > face-to-face today. I'm sorry if it's not comprehensible for those not > attending the fast-to-face. > > As already agreed, a one-shot recognition returns a single Result: > > Result { EMMA; Alternative[] } > Alternative { utterance, confidence, interpretation } > > Continuous recognition ('result' event), REQUIRED: > > - In continuous recognition mode, audio is continuously captured and > passed to the speech recognition service. > - The speech recognition service divides the audio into chunks in some > way (e.g. at sentence boundaries). > - If an SRGS grammar is specified for the continuous recognition > request, each Result should correspond to a single utterance in the > grammar. > - For each chunk, the speech recognition service sends a 'result' > event containing a Result object. > > Continuous recognition ('intermediate' event), OPTIONAL: > > - The speech recognition service may return 'intermediate' events. > - An intermediate event contains a Result which represents the entire > audio from the last 'result' event. > > Continuous recognition ('replace' event), OPTIONAL: > > - Each 'result' event has an ID. > - The speech recognition service can send 'replace' events containing > { ID of result to replace, new Result }. > - This must refer to a previous result event. > - It does not represent any new input. > > > An example using all three: > > User says "my hovercraft is full of eels. they are tasty." > > 1. 'intermediate': "may" > 2. 'intermediate': "my hovercraft" > 3. 'intermediate': "my hovercraft is fool" > 4. 'intermediate': "my hovercraft is full of eel" > 5. 'result': ID=0, "my hovercraft is full of eel." > 6. 'intermediate': "they" > 7. 'intermediate': "they are" > 8. 'intermediate': "they aren't tasty" > 9. 'result': ID=1 "they are tasty." > 10. 'replace': ID=0, "my hovercraft is full of eels." > > > It should be possible to change parameters and grammars during > continuous recognition. All 'result' events returned after a grammar > or parameter is changed must reflect that change. This means that the > speech recognition service may need to buffer audio since the last > 'result' event to rerecognize it in case of a parameter or grammar > change. > > > -- > Bjorn Bringert > Google UK Limited, Registered Office: Belgrave House, 76 Buckingham > Palace Road, London, SW1W 9TQ > Registered in England Number: 3977902 > > -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Monday, 23 May 2011 15:06:27 UTC