- From: Bjorn Bringert <bringert@google.com>
- Date: Mon, 23 May 2011 15:28:51 +0100
- To: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
This is a summary of the continuous recognition API proposed in the face-to-face today. I'm sorry if it's not comprehensible for those not attending the fast-to-face. As already agreed, a one-shot recognition returns a single Result: Result { EMMA; Alternative[] } Alternative { utterance, confidence, interpretation } Continuous recognition ('result' event), REQUIRED: - In continuous recognition mode, audio is continuously captured and passed to the speech recognition service. - The speech recognition service divides the audio into chunks in some way (e.g. at sentence boundaries). - If an SRGS grammar is specified for the continuous recognition request, each Result should correspond to a single utterance in the grammar. - For each chunk, the speech recognition service sends a 'result' event containing a Result object. Continuous recognition ('intermediate' event), OPTIONAL: - The speech recognition service may return 'intermediate' events. - An intermediate event contains a Result which represents the entire audio from the last 'result' event. Continuous recognition ('replace' event), OPTIONAL: - Each 'result' event has an ID. - The speech recognition service can send 'replace' events containing { ID of result to replace, new Result }. - This must refer to a previous result event. - It does not represent any new input. An example using all three: User says "my hovercraft is full of eels. they are tasty." 1. 'intermediate': "may" 2. 'intermediate': "my hovercraft" 3. 'intermediate': "my hovercraft is fool" 4. 'intermediate': "my hovercraft is full of eel" 5. 'result': ID=0, "my hovercraft is full of eel." 6. 'intermediate': "they" 7. 'intermediate': "they are" 8. 'intermediate': "they aren't tasty" 9. 'result': ID=1 "they are tasty." 10. 'replace': ID=0, "my hovercraft is full of eels." It should be possible to change parameters and grammars during continuous recognition. All 'result' events returned after a grammar or parameter is changed must reflect that change. This means that the speech recognition service may need to buffer audio since the last 'result' event to rerecognize it in case of a parameter or grammar change. -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Monday, 23 May 2011 14:29:16 UTC